git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* DEVEL: Help with feature implementation
@ 2021-01-18 18:00 FriendlyNeighborhoodShane
  0 siblings, 0 replies; 19+ messages in thread
From: FriendlyNeighborhoodShane @ 2021-01-18 18:00 UTC (permalink / raw)
  To: git



^ permalink raw reply	[flat|nested] 19+ messages in thread

* DEVEL: Help with feature implementation
@ 2021-01-18 18:06 Aiyee Bee
  2021-01-18 18:19 ` Antonio Russo
  2021-01-18 18:26 ` Derrick Stolee
  0 siblings, 2 replies; 19+ messages in thread
From: Aiyee Bee @ 2021-01-18 18:06 UTC (permalink / raw)
  To: git

Hello everybody!

I want to implement a feature in git, and I'm looking for some help.

I wanted to add a history-simplification option to rev-list so that it
doesn't simplify away any irrelevant commits if they have multiple
relevant _children_, i.e. when they are the point where two relevant
histories diverge.

Basically the effect I want for --show-forkpoints (named like --show-pulls):
http://ix.io/2Ms6

But it seems there is no existing apparatus in the revision walker for
deciding simplification on basis of *children*, am I correct? Admittedly
my understanding of it is still a WIP, but I don't see anything that
could help.

I was hoping that simply the flag CHILD_SHOWN could be checked, but it
seems that's only set on boundary commits :(

This option would be pretty useful when used with some diffs, to see how
much two forks have diverged. Currently if you use history simplification,
the diffs for both divergent histories will be created against the last
relevant commit instead of the last common ancestor, which creates pretty
useless diffs with a lot of intersection between them.

Are there any viable alternative means to do this that I can explore?

Thanks!

FriendlyNeighborhoodShane

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 18:06 DEVEL: Help with feature implementation Aiyee Bee
@ 2021-01-18 18:19 ` Antonio Russo
  2021-01-18 18:26 ` Derrick Stolee
  1 sibling, 0 replies; 19+ messages in thread
From: Antonio Russo @ 2021-01-18 18:19 UTC (permalink / raw)
  To: Aiyee Bee, git

On 1/18/21 11:06 AM, Aiyee Bee wrote:
> Hello everybody!
> 
> I want to implement a feature in git, and I'm looking for some help.
> 
> I wanted to add a history-simplification option to rev-list so that it
> doesn't simplify away any irrelevant commits if they have multiple
> relevant _children_, i.e. when they are the point where two relevant
> histories diverge.

I'm actually working on something that does the opposite---it ignores
the fork point when drawing the graph.  (I'm currently dog-fooding a
partial implementation of this.)

> Basically the effect I want for --show-forkpoints (named like --show-pulls):
> http://ix.io/2Ms6
> 
> But it seems there is no existing apparatus in the revision walker for
> deciding simplification on basis of *children*, am I correct? Admittedly
> my understanding of it is still a WIP, but I don't see anything that
> could help.

It's even worse than this.  AFAICT, there are two code paths: one for when
the git-commit-graph is enabled, and one when it is not.  There's a special
"revs->limited" option which lets you deal with the commits all at once, at
the expense of losing all the speed improvements from the git-commit-graph
efforts.

Aside: I want to get rid of that alternative path.  But I only have limited
time.

> I was hoping that simply the flag CHILD_SHOWN could be checked, but it
> seems that's only set on boundary commits :(
> 
> This option would be pretty useful when used with some diffs, to see how
> much two forks have diverged. Currently if you use history simplification,
> the diffs for both divergent histories will be created against the last
> relevant commit instead of the last common ancestor, which creates pretty
> useless diffs with a lot of intersection between them.

Sounds neat!

> Are there any viable alternative means to do this that I can explore?

I too would like to hear about suggestions for this.

> 
> Thanks!
> 
> FriendlyNeighborhoodShane
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 18:06 DEVEL: Help with feature implementation Aiyee Bee
  2021-01-18 18:19 ` Antonio Russo
@ 2021-01-18 18:26 ` Derrick Stolee
  2021-01-18 19:15   ` Aiyee Bee
                     ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Derrick Stolee @ 2021-01-18 18:26 UTC (permalink / raw)
  To: Aiyee Bee, git

On 1/18/2021 1:06 PM, Aiyee Bee wrote:
> Hello everybody!
> 
> I want to implement a feature in git, and I'm looking for some help.
> 
> I wanted to add a history-simplification option to rev-list so that it
> doesn't simplify away any irrelevant commits if they have multiple
> relevant _children_, i.e. when they are the point where two relevant
> histories diverge.
> 
> Basically the effect I want for --show-forkpoints (named like --show-pulls):
> http://ix.io/2Ms6

I think what you really want is --full-history --simplify-merges [1]. This
will show the merges that "fork" the history into parallel tracks where
at least two of them contain interesting commits.

Note that this option is expensive to compute, as it walks every
reachable commit before returning a single result.

Thanks,
-Stolee

[1] https://git-scm.com/docs/git-log#Documentation/git-log.txt---simplify-merges

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 18:26 ` Derrick Stolee
@ 2021-01-18 19:15   ` Aiyee Bee
  2021-01-18 22:54     ` Junio C Hamano
  2021-01-18 19:25   ` Aiyee Bee
  2021-01-18 19:31   ` Aiyee Bee
  2 siblings, 1 reply; 19+ messages in thread
From: Aiyee Bee @ 2021-01-18 19:15 UTC (permalink / raw)
  To: git

Hi Antonio and Derrick!

> I think what you really want is --full-history --simplify-merges [1]. This
> will show the merges that "fork" the history into parallel tracks where
> at least two of them contain interesting commits.

It doesn't look like the implementation of --simplify-merges helps much
here. That makes its decision on basis of the parents of the commit, which is
simple to do as it's information attached freely to each commit. I think the
problem here would be figuring out, given any commit, how many of its children
are "relevant" commits.

> I'm actually working on something that does the opposite---it ignores
> the fork point when drawing the graph.  (I'm currently dog-fooding a
> partial implementation of this.)

That's a pretty interesting coincidence :)

Just to throw ideas out there, maybe we could attach another commit_list,
children, to the commit metadata, so that all this becomes a little easier.
But I guess that's be pretty impractical and inefficient.

Maybe a more practical (but still pretty unusual) solution would be adding
counters to each commit that tell us how many times they have been traversed
Through various histories?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 18:26 ` Derrick Stolee
  2021-01-18 19:15   ` Aiyee Bee
@ 2021-01-18 19:25   ` Aiyee Bee
  2021-01-18 19:31   ` Aiyee Bee
  2 siblings, 0 replies; 19+ messages in thread
From: Aiyee Bee @ 2021-01-18 19:25 UTC (permalink / raw)
  To: Derrick Stolee



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 18:26 ` Derrick Stolee
  2021-01-18 19:15   ` Aiyee Bee
  2021-01-18 19:25   ` Aiyee Bee
@ 2021-01-18 19:31   ` Aiyee Bee
  2021-01-18 20:58     ` Derrick Stolee
  2 siblings, 1 reply; 19+ messages in thread
From: Aiyee Bee @ 2021-01-18 19:31 UTC (permalink / raw)
  To: git

Hi Antonio and Derrick!

> I think what you really want is --full-history --simplify-merges [1]. This
> will show the merges that "fork" the history into parallel tracks where
> at least two of them contain interesting commits.

It doesn't look like the implementation of --simplify-merges helps much
here. That makes its decision on basis of the parents of the commit, which is
simple to do as it's information attached freely to each commit. I think the
problem here would be figuring out, given any commit, how many of its children
are "relevant" commits.

> I'm actually working on something that does the opposite---it ignores
> the fork point when drawing the graph.  (I'm currently dog-fooding a
> partial implementation of this.)

That's a pretty interesting coincidence :)

Just to throw ideas out there, maybe we could attach another commit_list,
children, to the commit metadata, so that all this becomes a little easier.
But I guess that's be pretty impractical and inefficient.

Maybe a more practical (but still pretty unusual) solution would be adding
counters to each commit that tell us how many times they have been traversed
Through various histories?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 19:31   ` Aiyee Bee
@ 2021-01-18 20:58     ` Derrick Stolee
  2021-01-19  0:54       ` Antonio Russo
  0 siblings, 1 reply; 19+ messages in thread
From: Derrick Stolee @ 2021-01-18 20:58 UTC (permalink / raw)
  To: Aiyee Bee, git

On 1/18/2021 2:31 PM, Aiyee Bee wrote:
> Hi Antonio and Derrick!
> 
>> I think what you really want is --full-history --simplify-merges [1]. This
>> will show the merges that "fork" the history into parallel tracks where
>> at least two of them contain interesting commits.
> 
> It doesn't look like the implementation of --simplify-merges helps much
> here. That makes its decision on basis of the parents of the commit, which is
> simple to do as it's information attached freely to each commit. I think the
> problem here would be figuring out, given any commit, how many of its children
> are "relevant" commits.

You should definitely give this a try instead of assuming things about the
implementation. The algorithm uses a lot of "simplifying" that makes it look
like the decision is a local one. However, I assure you that is not the case.

Please assemble a test case that demonstrates the behavior you want and how
that is different from what is present in --simplify-merges.

-Stolee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 19:15   ` Aiyee Bee
@ 2021-01-18 22:54     ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2021-01-18 22:54 UTC (permalink / raw)
  To: Aiyee Bee; +Cc: git

"Aiyee Bee" <shane.880088.supw@gmail.com> writes:

> Maybe a more practical (but still pretty unusual) solution would be adding
> counters to each commit that tell us how many times they have been traversed
> Through various histories?

The "--show-forkpoints" output shown there at http://ix.io/2Ms6
looks interesting.  I think you'd need only 1 bit counter
(i.e. "have we visited it only once, or more than once" bit) for
your purpose?  You traverse from F and Z and the first time you
reach C (probably through the F->E->D->C path), you see no SEEN flag
on the object so leave the new bit alone, and then when the traverse
reaches C again (this time through the Z->Y->C path), while stopping
further traversal because you notice C is already SEEN, you mark it
with the new bit so record that it has been visited at least twice.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-18 20:58     ` Derrick Stolee
@ 2021-01-19  0:54       ` Antonio Russo
  2021-01-19  1:24         ` Junio C Hamano
  2021-01-19  2:39         ` Derrick Stolee
  0 siblings, 2 replies; 19+ messages in thread
From: Antonio Russo @ 2021-01-19  0:54 UTC (permalink / raw)
  To: Derrick Stolee, Aiyee Bee, git

On 1/18/21 1:58 PM, Derrick Stolee wrote:
> On 1/18/2021 2:31 PM, Aiyee Bee wrote:
>> Hi Antonio and Derrick!
>>
>>> I think what you really want is --full-history --simplify-merges [1]. This
>>> will show the merges that "fork" the history into parallel tracks where
>>> at least two of them contain interesting commits.
>>
>> It doesn't look like the implementation of --simplify-merges helps much
>> here. That makes its decision on basis of the parents of the commit, which is
>> simple to do as it's information attached freely to each commit. I think the
>> problem here would be figuring out, given any commit, how many of its children
>> are "relevant" commits.
> 
> You should definitely give this a try instead of assuming things about the
> implementation. The algorithm uses a lot of "simplifying" that makes it look
> like the decision is a local one. However, I assure you that is not the case.

As a side note, would this list be willing to look at patches that remove
the need to use revs->limited?  Adding new features would be much easier if
we could restrict git to use streaming algorithms for these simplifications.

> Please assemble a test case that demonstrates the behavior you want and how
> that is different from what is present in --simplify-merges.

I can't figure out how to get the behavior from --simplify-merges, which is
described as

	Additional option to --full-history to remove some needless
	merges from  the resulting history, as there are no selected
	commits contributing to this merge.

It seems that the desired behavior is to include commits which are parents to
multiple branches.  Here is an example:

test_commit() {
 echo >> file
 git add file
 git commit "$@"
}

git init
test_commit -m a
test_commit -m b
test_commit -m c
git checkout -b fork
test_commit -m y
test_commit -m z
git switch master
test_commit -m d
test_commit -m e
test_commit -m f

git log --graph --oneline master fork

* 08029fd f
* 55b09fe e
* 83b7801 d
| * efc204e z
| * 316219e y
|/  
* 3594039 c
* 4321987 b
* bd44220 a

git log --graph --oneline --full-history --simplify-merges master fork

* 08029fd f
* 55b09fe e
* 83b7801 d
| * efc204e z
| * 316219e y
|/  
* 3594039 c
* 4321987 b
* bd44220 a

git log --graph --oneline --simplify-by-decoration --full-history --simplify-merges master fork

* 08029fd f
| * efc204e z
|/  
* bd44220 a

git log --graph --oneline --full-history --simplify-merges master fork

* 08029fd f
* 55b09fe e
* 83b7801 d
| * efc204e z
| * 316219e y
|/  
* 3594039 c
* 4321987 b
* bd44220 a

git --version
git version 2.30.0

I can't seem to get commit c, the crucial fork, to show up with simplifications with this mechanism.
Am I missing something here?

> 
> -Stolee
> 

Antonio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19  0:54       ` Antonio Russo
@ 2021-01-19  1:24         ` Junio C Hamano
  2021-01-19 14:52           ` Antonio Russo
  2021-01-19  2:39         ` Derrick Stolee
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2021-01-19  1:24 UTC (permalink / raw)
  To: Antonio Russo; +Cc: Derrick Stolee, Aiyee Bee, git

Antonio Russo <aerusso@aerusso.net> writes:

> As a side note, would this list be willing to look at patches that remove
> the need to use revs->limited?  Adding new features would be much easier if
> we could restrict git to use streaming algorithms for these simplifications.

Do you mean you will write new implementations of operations like
"--simplify-merges", "--ancestory-path" and all its friends without
the need for the "limited" bit?

Willing to look at?  I do not know about others, but sure, it would
make be extremely excited.

You may be able to rewrite some other operations that implicitly
require the limited bit into an incremental implementation, but I am
skeptical that you can do "simplify-merges" in a streaming fashion
in such a way that we'd dig a bit but not all the way down before
making decisions on which commits should be included in the output
and how their parenthood relationship should appear etc.  I and
Linus tried independently and we did not get that far in our
attempts (note: there wasn't commit-graph back then).

If you are talking about precomputed stuff like commit-graph, that
may change the equation, but we'd prefer to have the system work
without requiring these auxiliary data (in other words, it would be
fine to use them as optimization).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19  0:54       ` Antonio Russo
  2021-01-19  1:24         ` Junio C Hamano
@ 2021-01-19  2:39         ` Derrick Stolee
  2021-01-19  5:35           ` Aiyee Bee
                             ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Derrick Stolee @ 2021-01-19  2:39 UTC (permalink / raw)
  To: Antonio Russo, Aiyee Bee, git

On 1/18/2021 7:54 PM, Antonio Russo wrote:
> On 1/18/21 1:58 PM, Derrick Stolee wrote:
>> On 1/18/2021 2:31 PM, Aiyee Bee wrote:
>>> Hi Antonio and Derrick!
>>>
>>>> I think what you really want is --full-history --simplify-merges [1]. This
>>>> will show the merges that "fork" the history into parallel tracks where
>>>> at least two of them contain interesting commits.
>>>
>>> It doesn't look like the implementation of --simplify-merges helps much
>>> here. That makes its decision on basis of the parents of the commit, which is
>>> simple to do as it's information attached freely to each commit. I think the
>>> problem here would be figuring out, given any commit, how many of its children
>>> are "relevant" commits.
>>
>> You should definitely give this a try instead of assuming things about the
>> implementation. The algorithm uses a lot of "simplifying" that makes it look
>> like the decision is a local one. However, I assure you that is not the case.
> 
> As a side note, would this list be willing to look at patches that remove
> the need to use revs->limited?  Adding new features would be much easier if
> we could restrict git to use streaming algorithms for these simplifications.

I would _love_ to see patches that remove that bit (without modifying
the behavior).

Fair warning: I definitely spent a few weeks attempting to do any amount
of reducing the depth one needs to walk in order to compute the
--simplify-merges history, but a sufficiently-complicated branch history
makes it nearly impossible to gain a benefit.

>> Please assemble a test case that demonstrates the behavior you want and how
>> that is different from what is present in --simplify-merges.
> 
> I can't figure out how to get the behavior from --simplify-merges, which is
> described as
> 
> 	Additional option to --full-history to remove some needless
> 	merges from  the resulting history, as there are no selected
> 	commits contributing to this merge.
> 
> It seems that the desired behavior is to include commits which are parents to
> multiple branches.  Here is an example:

Thank you for these examples. They clearly show that I misread your
ask, because you're not looking for "merge commits" but instead you
are looking to show the "merge bases" as history is walking.

Sorry for misinterpreting your request, then doubling down on it.

> test_commit() {
>  echo >> file
>  git add file
>  git commit "$@"
> }
> 
> git init
> test_commit -m a
> test_commit -m b
> test_commit -m c
> git checkout -b fork
> test_commit -m y
> test_commit -m z
> git switch master
> test_commit -m d
> test_commit -m e
> test_commit -m f
> 
> git log --graph --oneline master fork
> 
> * 08029fd f
> * 55b09fe e
> * 83b7801 d
> | * efc204e z
> | * 316219e y
> |/  
> * 3594039 c
> * 4321987 b
> * bd44220 a
> 
> git log --graph --oneline --full-history --simplify-merges master fork
> 
> * 08029fd f
> * 55b09fe e
> * 83b7801 d
> | * efc204e z
> | * 316219e y
> |/  
> * 3594039 c
> * 4321987 b
> * bd44220 a
> 
> git log --graph --oneline --simplify-by-decoration --full-history --simplify-merges master fork
> 
> * 08029fd f
> | * efc204e z
> |/  
> * bd44220 a
> 
> git log --graph --oneline --full-history --simplify-merges master fork
> 
> * 08029fd f
> * 55b09fe e
> * 83b7801 d
> | * efc204e z
> | * 316219e y
> |/  
> * 3594039 c
> * 4321987 b
> * bd44220 a
> 
> git --version
> git version 2.30.0
> 
> I can't seem to get commit c, the crucial fork, to show up with simplifications with this mechanism.
> Am I missing something here?

In your example, you are not specifying a path. In this case, you are
really looking for "git merge-base master fork". You could also use
"git log --boundary master...fork" to show everything up to and
including 'c'.

Now, if you specify a pathspec, then 'git merge-base' isn't going to
help. That becomes a technically interesting problem.

The biggest reason that "git log" doesn't show this commit 'c' easily
is because...it's not really that important. When that commit was
created, it didn't "know" that it would be a common base of two
diverging branches. By surfacing the commit, we are very unlikely to
present the user with information that is helpful.

One way to expose the "first layer" of these merge bases is with the
--boundary option.

Here is an example of two branches in the Git ecosystem that can
present an interesting real example: code in builtin/gc.c has been
changed in both 'master' from origin (https://github.com/git/git) and
in 'main' from windows (https://github.com/git-for-windows/git):

$ git log --no-decorate --oneline --graph origin/master...windows/main  -- builtin/gc.c
* b2ace18759c Merge branch 'ds/maintenance-part-4'
* c977ff44073 Merge branch 'pk/subsub-fetch-fix-take-2'
*   95124ead43a Merge branch 'main' into maintenance-and-headless
|\  
| * ff61c9b1f49 Start the merging-rebase to v2.30.0-rc2
| * afb16063390 Start the merging-rebase to v2.30.0-rc0
| * 13c9d54a45b Start the merging-rebase to v2.29.0-rc0
| * c00a524ea88 strvec: rename struct fields
| * a864a84f1ee strvec: fix indentation in renamed calls
| * 12922aa02db strvec: convert builtin/ callers away from argv_array name
| * 86f1b0f0e04 strvec: rename files from argv-array to strvec
* 5bef0904826 git maintenance: avoid console window in scheduled tasks on Windows

Something that is hard to see from this picture is that the first two
commits are not actually connected to the third. This is more visible
when adding --boundary:

$ git log --no-decorate --oneline --graph --boundary origin/master...windows/main -- builtin/gc.c
*   b2ace18759c Merge branch 'ds/maintenance-part-4'
|\  
* \   c977ff44073 Merge branch 'pk/subsub-fetch-fix-take-2'
|\ \  
| | | *   95124ead43a Merge branch 'main' into maintenance-and-headless
| | | |\  
| | | | *   ff61c9b1f49 Start the merging-rebase to v2.30.0-rc2
| | | | |\  
| | | | | *   afb16063390 Start the merging-rebase to v2.30.0-rc0
| | | | | |\  
| | | | | | *   13c9d54a45b Start the merging-rebase to v2.29.0-rc0
| | | | | | |\  
| | | | | | | * c00a524ea88 strvec: rename struct fields
| | | | | | | * a864a84f1ee strvec: fix indentation in renamed calls
| | | | | | | * 12922aa02db strvec: convert builtin/ callers away from argv_array name
| | | | | | | * 86f1b0f0e04 strvec: rename files from argv-array to strvec
| | | * | | | | 5bef0904826 git maintenance: avoid console window in scheduled tasks on Windows
| | |/ / / / /  
| | | | | | o 47ae905ffb9 Git 2.28
| | | | | o d98273ba77e Git 2.29-rc0
| | | | o 1c52ecf4ba0 Git 2.30-rc0
| | | o 4a0de43f492 Git 2.30-rc2
| o | 898f80736c7 Git 2.29.2
|  /  
o / 71ca53e8125 Git 2.30
 /  
o 3797a0a7b7a maintenance: use Windows scheduled tasks

Notice that all these boundaries are actually adding "uninteresting"
bases _and_ merges.

Probably, the suggested --show-forkpoints would need to start from
something like this and simplify it further.

The problem that I see as well is what happens as the history itself
branches repeatedly. Consider the history of builtin/gc.c again, this
time with a single starting ref. (I have inserted places where your
fork points might add commits to the first few results.)

$ git log --no-decorate --oneline --graph origin/master -- builtin/gc.c
*   b2ace18759c Merge branch 'ds/maintenance-part-4'
|\  
| * 3797a0a7b7a maintenance: use Windows scheduled tasks
| * 2afe7e35672 maintenance: use launchctl on macOS
| * 31345d5545e maintenance: extract platform-specific scheduling
* | 66dc0a3625e gc: fix handling of crontab magic markers
* |   f2a75cb312d Merge branch 'rs/maintenance-run-outside-repo'
|\ \  
| * | e72f7defc4f maintenance: fix SEGFAULT when no repository
* | |   d702cb9e890 Merge branch 'ds/maintenance-part-3'
|\ \ \  
| * | | 483a6d9b5da maintenance: use 'git config --fixed-value'
* | | |   1c04cdd424b Merge branch 'ab/gc-keep-base-option'
|\ \ \ \  
| |_|/ /  
|/| | |   

* | | |  (some fork point here)

| * | | 793c1464d3e gc: rename keep_base_pack variable for --keep-largest-pack
* | | | a1c74791d5f gc: fix cast in compare_tasks_by_selection()
| |/ /  
|/| |   

* | |    (some fork point here)

* | |   7660da16182 Merge branch 'ds/maintenance-part-3'
|\ \ \  
| | |/  
| |/|   

| * |    (some fork point here)

| * | 61f7a383d3b maintenance: use 'incremental' strategy by default
| * | a4cb1a2339c maintenance: create maintenance.strategy config
| * | 2fec604f8df maintenance: add start/stop subcommands
| * | 0c18b700810 maintenance: add [un]register subcommands
| * | b08ff1fee00 maintenance: add --schedule option and config
* | |   902f358555c Merge branch 'rs/clear-commit-marks-in-repo

When using file history simplification, there is actually a very
simple way to detect whether a commit is a fork point (assuming
that you are walking in generation number order):

- As you walk and see that commit C is not TREESAME to commit D,
  increment a "walked indegree" count on D. (Note: this would be
  different from the current "indegree" slab used in topo-order
  calculations.)

- As you visit a commit, check its indegree. If this is larger
  than 1, then it is a forkpoint of two "interesting" histories.

This entire procedure relies on the fact that simplified history
only visits multiple parents if the merge commit is not TREESAME
to _any_ parent. Otherwise, it walks only to its first TREESAME
parent, "simplifying away" the other parents.

To implement your fork-points in the case of --full-history
or --simplify-merges would require something much more sophisticated.

   (taking an extra pause here to shift gears)

But I think the --show-forkpoints option needs to be justified a
bit better. I'm still not convinced that this is actually
interesting information, unless you are literally looking for a
merge base or a log boundary, in which case there are ways to
expose those values.

The example of --show-pulls has such a justification:

  If a user wants to see which merge commits introduced the
  interesting commits into this history, then --show-pulls
  presents those merges. Such merges might have extra
  information in their commit messages, such as pull request
  IDs, topic branch names, or information about tricky merge
  conflict resolutions.

Is there something similar to justify --show-forkpoint?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
@ 2021-01-19  4:58 Aiyee Bee
  0 siblings, 0 replies; 19+ messages in thread
From: Aiyee Bee @ 2021-01-19  4:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee, Antonio Russo

Hi Derrick and Junio!

Thanks for confirming my suspicions, that there isn't existing apparatus
for doing this, and a new bit can be added as a flag. Thanks for the
*pointers, I'll try working it out! With Antonio, in fact, since they'd
need the same thing for implementing the opposite feature.

As for the justification, to elaborate on what I said in the opening mail:

This option would be pretty useful when used with diff options, to see how
much two forks have diverged. I recently wanted to compare two long-running
forks of a project and see how far they have diverged. I used the options:
--simplufy-by-decoration --decorate-refs="refs/remotes/*/main" -p

But currently with this history simplification, the diffs for both divergent
histories will be created against the last "relevant" (decorated) commit
instead of the last common ancestor, which creates pretty useless diffs with
a lot of intersection between them.

You're right that the option doesn't sound very useful when thought of in
regards to history limiting by path, but IMO it's pretty useful in cases
like these, when you use simplify to get an 'overview' of the history.

Also sorry for the blank/duplicate emails everyone! My email client seems to
be misbehaving, largely because I tried to tinker with it :P

- Aiyee Bee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
@ 2021-01-19  5:20 Aiyee Bee
  0 siblings, 0 replies; 19+ messages in thread
From: Aiyee Bee @ 2021-01-19  5:20 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Antonio Russo

Forwarded message from Aiyee Bee on Tue Jan 19, 2021 at 4:58 AM:
Hi Derrick and Junio!

Thanks for confirming my suspicions, that there isn't existing apparatus
for doing this, and a new bit can be added as a flag. Thanks for the
*pointers, I'll try working it out! With Antonio, in fact, since they'd
need the same thing for implementing the opposite feature.

As for the justification, to elaborate on what I said in the opening
mail:

This option would be pretty useful when used with diff options, to see
how
much two forks have diverged. I recently wanted to compare two
long-running
forks of a project and see how far they have diverged. I used the
options:
--simplufy-by-decoration --decorate-refs="refs/remotes/*/main" -p

But currently with this history simplification, the diffs for both
divergent
histories will be created against the last "relevant" (decorated) commit
instead of the last common ancestor, which creates pretty useless diffs
with
a lot of intersection between them.

You're right that the option doesn't sound very useful when thought of
in
regards to history limiting by path, but IMO it's pretty useful in cases
like these, when you use simplify to get an 'overview' of the history.

Also sorry for the blank/duplicate emails everyone! My email client
seems to
be misbehaving, largely because I tried to tinker with it :P

- Aiyee Bee


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19  2:39         ` Derrick Stolee
@ 2021-01-19  5:35           ` Aiyee Bee
  2021-01-19  5:38           ` Aiyee Bee
  2021-01-19 15:13           ` Antonio Russo
  2 siblings, 0 replies; 19+ messages in thread
From: Aiyee Bee @ 2021-01-19  5:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Antonio Russo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19  2:39         ` Derrick Stolee
  2021-01-19  5:35           ` Aiyee Bee
@ 2021-01-19  5:38           ` Aiyee Bee
  2021-01-19 15:13           ` Antonio Russo
  2 siblings, 0 replies; 19+ messages in thread
From: Aiyee Bee @ 2021-01-19  5:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Antonio Russo

Hi Derrick and Junio!

Thanks for confirming my suspicions, that there isn't existing apparatus
for doing this, and a new bit can be added as a flag. Thanks for the
*pointers, I'll try working it out! With Antonio, in fact, since they'd
need the same thing for implementing the opposite feature.

As for the justification, to elaborate on what I said in the opening mail:

This option would be pretty useful when used with diff options, to see how
much two forks have diverged. I recently wanted to compare two long-running
forks of a project and see how far they have diverged. I used the options:
--simplufy-by-decoration --decorate-refs="refs/remotes/*/main" -p

But currently with this history simplification, the diffs for both divergent
histories will be created against the last "relevant" (decorated) commit
instead of the last common ancestor, which creates pretty useless diffs with
a lot of intersection between them.

You're right that the option doesn't sound very useful when thought of in
regards to history limiting by path, but IMO it's pretty useful in cases
like these, when you use simplify to get an 'overview' of the history.

Also sorry for the blank/duplicate emails everyone! My email client seems to
be misbehaving, largely because I tried to tinker with it :P

- Aiyee Bee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19  1:24         ` Junio C Hamano
@ 2021-01-19 14:52           ` Antonio Russo
  2021-01-20  2:21             ` Derrick Stolee
  0 siblings, 1 reply; 19+ messages in thread
From: Antonio Russo @ 2021-01-19 14:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, Aiyee Bee, git

On 1/18/21 6:24 PM, Junio C Hamano wrote:
> Antonio Russo <aerusso@aerusso.net> writes:
> 
>> As a side note, would this list be willing to look at patches that remove
>> the need to use revs->limited?  Adding new features would be much easier if
>> we could restrict git to use streaming algorithms for these simplifications.
> 
> Do you mean you will write new implementations of operations like
> "--simplify-merges", "--ancestory-path" and all its friends without
> the need for the "limited" bit?

Yes.

> Willing to look at?  I do not know about others, but sure, it would
> make be extremely excited.
> 
> You may be able to rewrite some other operations that implicitly
> require the limited bit into an incremental implementation, but I am
> skeptical that you can do "simplify-merges" in a streaming fashion
> in such a way that we'd dig a bit but not all the way down before
> making decisions on which commits should be included in the output
> and how their parenthood relationship should appear etc.  I and
> Linus tried independently and we did not get that far in our
> attempts (note: there wasn't commit-graph back then).

Yeah, it's a big task (but, it'd be useful work, rather than doing
another rewrite of my feature to make it work when revs->limited).
Each of the flags (simplify-merges, ancestry-path, etc.) is its own
little sub-project.

I haven't thought about any one flag specifically, but the fact that
two complete code paths exist right now just seem like a nightmare.
I'm not necessarily interested in making the implementations faster,
but rather getting rid of the duplicated code path.

It's also a long-term goal, but it's nice to hear that people would
be interested in it.

> If you are talking about precomputed stuff like commit-graph, that
> may change the equation, but we'd prefer to have the system work
> without requiring these auxiliary data (in other words, it would be
> fine to use them as optimization).

No, I understand that generation numbers can only be used as an
optimization (i.e., they might all be GENERATION_NUMBER_INFINITY).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19  2:39         ` Derrick Stolee
  2021-01-19  5:35           ` Aiyee Bee
  2021-01-19  5:38           ` Aiyee Bee
@ 2021-01-19 15:13           ` Antonio Russo
  2 siblings, 0 replies; 19+ messages in thread
From: Antonio Russo @ 2021-01-19 15:13 UTC (permalink / raw)
  To: Derrick Stolee, Aiyee Bee, git



On 1/18/21 7:39 PM, Derrick Stolee wrote:
> On 1/18/2021 7:54 PM, Antonio Russo wrote:
>> On 1/18/21 1:58 PM, Derrick Stolee wrote:
>>> On 1/18/2021 2:31 PM, Aiyee Bee wrote:
>>>> Hi Antonio and Derrick!
>>>>
>>>>> I think what you really want is --full-history --simplify-merges [1]. This
>>>>> will show the merges that "fork" the history into parallel tracks where
>>>>> at least two of them contain interesting commits.
>>>>
>>>> It doesn't look like the implementation of --simplify-merges helps much
>>>> here. That makes its decision on basis of the parents of the commit, which is
>>>> simple to do as it's information attached freely to each commit. I think the
>>>> problem here would be figuring out, given any commit, how many of its children
>>>> are "relevant" commits.
>>>
>>> You should definitely give this a try instead of assuming things about the
>>> implementation. The algorithm uses a lot of "simplifying" that makes it look
>>> like the decision is a local one. However, I assure you that is not the case.
>>
>> As a side note, would this list be willing to look at patches that remove
>> the need to use revs->limited?  Adding new features would be much easier if
>> we could restrict git to use streaming algorithms for these simplifications.
> 
> I would _love_ to see patches that remove that bit (without modifying
> the behavior).
> 
> Fair warning: I definitely spent a few weeks attempting to do any amount
> of reducing the depth one needs to walk in order to compute the
> --simplify-merges history, but a sufficiently-complicated branch history
> makes it nearly impossible to gain a benefit.

The goal I had in mind was just to remove the alternate code path, making
new features easier to write (i.e., you don't have to do them twice).

>>> Please assemble a test case that demonstrates the behavior you want and how
>>> that is different from what is present in --simplify-merges.
>>
>> I can't figure out how to get the behavior from --simplify-merges, which is
>> described as
>>
>> 	Additional option to --full-history to remove some needless
>> 	merges from  the resulting history, as there are no selected
>> 	commits contributing to this merge.
>>
>> It seems that the desired behavior is to include commits which are parents to
>> multiple branches.  Here is an example:
> 
> Thank you for these examples. They clearly show that I misread your
> ask, because you're not looking for "merge commits" but instead you
> are looking to show the "merge bases" as history is walking.
> 
> Sorry for misinterpreting your request, then doubling down on it.

No problem! (Just to be clear, the is a request of shane.880088.supw,
not me.)

> 
>> test_commit() {
>>  echo >> file
>>  git add file
>>  git commit "$@"
>> }
>>
>> git init
>> test_commit -m a
>> test_commit -m b
>> test_commit -m c
>> git checkout -b fork
>> test_commit -m y
>> test_commit -m z
>> git switch master
>> test_commit -m d
>> test_commit -m e
>> test_commit -m f
>>
>> git log --graph --oneline master fork
>>
>> * 08029fd f
>> * 55b09fe e
>> * 83b7801 d
>> | * efc204e z
>> | * 316219e y
>> |/  
>> * 3594039 c
>> * 4321987 b
>> * bd44220 a
>>
>> git log --graph --oneline --full-history --simplify-merges master fork
>>
>> * 08029fd f
>> * 55b09fe e
>> * 83b7801 d
>> | * efc204e z
>> | * 316219e y
>> |/  
>> * 3594039 c
>> * 4321987 b
>> * bd44220 a
>>
>> git log --graph --oneline --simplify-by-decoration --full-history --simplify-merges master fork
>>
>> * 08029fd f
>> | * efc204e z
>> |/  
>> * bd44220 a
>>
>> git log --graph --oneline --full-history --simplify-merges master fork
>>
>> * 08029fd f
>> * 55b09fe e
>> * 83b7801 d
>> | * efc204e z
>> | * 316219e y
>> |/  
>> * 3594039 c
>> * 4321987 b
>> * bd44220 a
>>
>> git --version
>> git version 2.30.0
>>
>> I can't seem to get commit c, the crucial fork, to show up with simplifications with this mechanism.
>> Am I missing something here?
> 
> In your example, you are not specifying a path. In this case, you are
> really looking for "git merge-base master fork". You could also use
> "git log --boundary master...fork" to show everything up to and
> including 'c'.
> 
> Now, if you specify a pathspec, then 'git merge-base' isn't going to
> help. That becomes a technically interesting problem.
> 
> The biggest reason that "git log" doesn't show this commit 'c' easily
> is because...it's not really that important. When that commit was
> created, it didn't "know" that it would be a common base of two
> diverging branches. By surfacing the commit, we are very unlikely to
> present the user with information that is helpful.

I think shane.880088.supw's point was that it's importance is, exactly as
you point out, not locally computable, only arising because it is a merge
base.

[snip (but an interesting read)]

> 
> Thanks,
> -Stolee
>
Antonio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: DEVEL: Help with feature implementation
  2021-01-19 14:52           ` Antonio Russo
@ 2021-01-20  2:21             ` Derrick Stolee
  0 siblings, 0 replies; 19+ messages in thread
From: Derrick Stolee @ 2021-01-20  2:21 UTC (permalink / raw)
  To: Antonio Russo, Junio C Hamano; +Cc: Aiyee Bee, git

On 1/19/2021 9:52 AM, Antonio Russo wrote:
> On 1/18/21 6:24 PM, Junio C Hamano wrote:
>> Antonio Russo <aerusso@aerusso.net> writes:
>>
>>> As a side note, would this list be willing to look at patches that remove
>>> the need to use revs->limited?  Adding new features would be much easier if
>>> we could restrict git to use streaming algorithms for these simplifications.
>>
>> Do you mean you will write new implementations of operations like
>> "--simplify-merges", "--ancestory-path" and all its friends without
>> the need for the "limited" bit?
> 
> Yes.
> 
>> Willing to look at?  I do not know about others, but sure, it would
>> make be extremely excited.
>>
>> You may be able to rewrite some other operations that implicitly
>> require the limited bit into an incremental implementation, but I am
>> skeptical that you can do "simplify-merges" in a streaming fashion
>> in such a way that we'd dig a bit but not all the way down before
>> making decisions on which commits should be included in the output
>> and how their parenthood relationship should appear etc.  I and
>> Linus tried independently and we did not get that far in our
>> attempts (note: there wasn't commit-graph back then).
> 
> Yeah, it's a big task (but, it'd be useful work, rather than doing
> another rewrite of my feature to make it work when revs->limited).
> Each of the flags (simplify-merges, ancestry-path, etc.) is its own
> little sub-project.
> 
> I haven't thought about any one flag specifically, but the fact that
> two complete code paths exist right now just seem like a nightmare.
> I'm not necessarily interested in making the implementations faster,
> but rather getting rid of the duplicated code path.

It's definitely a long shot to remove the limited flag altogether,
but a good start would be to remove sort_in_topological_order().
All of its logic is already re-implemented in the streaming version
(see every use of "struct topo_walk_info" in revision.c). The
streaming is designed to be faster in the presence of a commit-graph
file, but it should still work correctly without one. Since all
commits with have "generation number infinity," each phase of the
walk will do the same as the limit_list() and walk all reachable
commits before returning the first result.

The only thing to ask is: what is the overhead of the streaming
version over sort_in_topological_order()? Is there one? You'd need
to do performance tests without a commit-graph file.

This has long been on my own TODO list, but I haven't been able to
prioritize it over other things. I'd be happy to review the change.
It also would be a good way to familiarize yourself with the code
and how we already do some streaming things, even when "extra"
information is needed to accomplish that.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-01-20  2:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18 18:06 DEVEL: Help with feature implementation Aiyee Bee
2021-01-18 18:19 ` Antonio Russo
2021-01-18 18:26 ` Derrick Stolee
2021-01-18 19:15   ` Aiyee Bee
2021-01-18 22:54     ` Junio C Hamano
2021-01-18 19:25   ` Aiyee Bee
2021-01-18 19:31   ` Aiyee Bee
2021-01-18 20:58     ` Derrick Stolee
2021-01-19  0:54       ` Antonio Russo
2021-01-19  1:24         ` Junio C Hamano
2021-01-19 14:52           ` Antonio Russo
2021-01-20  2:21             ` Derrick Stolee
2021-01-19  2:39         ` Derrick Stolee
2021-01-19  5:35           ` Aiyee Bee
2021-01-19  5:38           ` Aiyee Bee
2021-01-19 15:13           ` Antonio Russo
  -- strict thread matches above, loose matches on Subject: below --
2021-01-19  5:20 Aiyee Bee
2021-01-19  4:58 Aiyee Bee
2021-01-18 18:00 FriendlyNeighborhoodShane

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).