Splitting a rev list into 2 sets

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Splitting a rev list into 2 sets
@ 2013-06-20 10:14 Francis Moreau
  2013-06-20 11:26 ` Ramkumar Ramachandra
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Francis Moreau @ 2013-06-20 10:14 UTC (permalink / raw)
  To: git

Hello,

I'd like to write a script that would parse commits in one of my repo.
Ideally this script should accept any revision ranges that
git-rev-list would accept.

This script should consider commits in master differently than the
ones in others branches.

To get the commit set which can't be reached by master (ie commits
which are specific to branches other than master) I would do:

  # "$@" is the range spec passed to the script
  git rev-list "$@" ^master | check_other_commit

But I don't know if it's possible to use a different git-rev-list
command to get the rest of the commits, ie the ones that are reachable
by the specified range and master.

One way to do that is to record the first commit set got by the first
rev-list command and check that the ones returned by "git rev-list $@"
are not in the record.

But I'm wondering if someone can see another solution more elegant ?

Thanks
--
Francis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 10:14 Splitting a rev list into 2 sets Francis Moreau
@ 2013-06-20 11:26 ` Ramkumar Ramachandra
  2013-06-20 13:12   ` Francis Moreau
  2013-06-20 13:04 ` Phil Hord
  2013-06-20 13:20 ` Thomas Rast
  2 siblings, 1 reply; 12+ messages in thread
From: Ramkumar Ramachandra @ 2013-06-20 11:26 UTC (permalink / raw)
  To: Francis Moreau; +Cc: git

Francis Moreau wrote:
> To get the commit set which can't be reached by master (ie commits
> which are specific to branches other than master) I would do:
>
>   # "$@" is the range spec passed to the script
>   git rev-list "$@" ^master | check_other_commit
>
> But I don't know if it's possible to use a different git-rev-list
> command to get the rest of the commits, ie the ones that are reachable
> by the specified range and master.
>
> One way to do that is to record the first commit set got by the first
> rev-list command and check that the ones returned by "git rev-list $@"
> are not in the record.

I don't fully understand your query, because almost anything is
possible with rev-list:

  $ git rev-list foo..bar master # reachable from master, bar, not foo

What I _suspect_ you're asking is for help when you can't construct
this "foo..bar master" programmatically (or when you cannot express
your criterion as arguments to rev-list).  You want an initial commit
set, and filter it at various points in your program using various
criteria, right?  In that case, I'd suggest something like this:

    # Returns a list of commits given a committish that `rev-list`
    # accepts.
    def self.list_commits(committish)
        commits = []
        revlist = execute("git", "rev-list", "--reverse", "--date-order",
                          "--simplify-merges", committish).chomp.split("\n")

        # do it in batches of 1000 commits
        while revlist
            these_revs = revlist.first(1000).join("\n")
            this_chunk = execute({ :in => these_revs }, "git",
                               "cat-file", "--batch")

            # parse_cat_file parses the chunk and updates @commit_index
            parse_cat_file(this_chunk) { |struct| commits << struct }

            revlist = revlist[1000 .. revlist.length - 1]
        end
        return commits
    end

    # Filters a list of commits with the precondition that it exists
    # in the committish.  :sha1 is used to uniquely identify a commit.
    def self.filter_commits(commits, committish)
        revlist = execute("git", "rev-list", "--simplify-merges",
                                 committish).split("\n")
        allowed_commits = revlist.map { |sha1| @commit_index[sha1.hex] }
        return commits & allowed_commits
    end

In essence, I use '&' to filter and it's extremely fast.  The trick is
to shell out to git sparingly, store the data you get in a sensible
manner, and build fast custom filters based on what you want.  Here
are a few more examples:

    # Filters a list of commits with the precondition that it is a
    # first-parent commit in a given committish.
    def self.filter_fp_commits(commits, committish)
        revlist = execute("git", "rev-list", "--first-parent",
                          "--simplify-merges", committish).split("\n")
        allowed_commits = revlist.map { |sha1| @commit_index[sha1.hex] }
        return commits & allowed_commits
    end

    # Slice a list of commits using a start_hex and end_hex, which
    # may both be nil.
    def self.slice_commits(commits, start_commit, end_commit)
        start_idx = commits.index(start_commit)
        end_idx = commits.index(end_commit)
        start_idx = 0 if start_idx.nil?
        end_idx = commits.size - 1 if end_idx.nil?
        return commits[start_idx..end_idx]
    end

    def self.filter_commits_tree_path(commits, path)
        commit_chunk = (commits.map { |commit| commit.sha1 }).join("\n")
        commit_chunk = "#{commit_chunk}\n"
        diff_tree_chunk = execute({ :in => commit_chunk }, "git", "diff-tree", \
                                  "-m", "-r", "-s", "--stdin", path)
        matching_sha1s = diff_tree_chunk.split("\n")
        allowed_commits = matching_sha1s.map { |sha1| @commit_index[sha1.hex] }
        return commits & allowed_commits
    end

Did that help?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 10:14 Splitting a rev list into 2 sets Francis Moreau
  2013-06-20 11:26 ` Ramkumar Ramachandra
@ 2013-06-20 13:04 ` Phil Hord
  2013-06-20 13:17   ` Francis Moreau
  2013-06-20 13:20 ` Thomas Rast
  2 siblings, 1 reply; 12+ messages in thread
From: Phil Hord @ 2013-06-20 13:04 UTC (permalink / raw)
  To: Francis Moreau; +Cc: git@vger.kernel.org

On Thu, Jun 20, 2013 at 6:14 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> I'd like to write a script that would parse commits in one of my repo.
> Ideally this script should accept any revision ranges that
> git-rev-list would accept.
>
> This script should consider commits in master differently than the
> ones in others branches.
>
> To get the commit set which can't be reached by master (ie commits
> which are specific to branches other than master) I would do:
>
>   # "$@" is the range spec passed to the script
>   git rev-list "$@" ^master | check_other_commit
>
> But I don't know if it's possible to use a different git-rev-list
> command to get the rest of the commits, ie the ones that are reachable
> by the specified range and master.
>
> One way to do that is to record the first commit set got by the first
> rev-list command and check that the ones returned by "git rev-list $@"
> are not in the record.
>
> But I'm wondering if someone can see another solution more elegant ?

I do not know if I would call this elegant, but I think this
codification of your "One way to do that" is at least small and mostly
readable:

   git rev-list "$@" |grep -v -f <(git rev-list "$@" ^master)

Phil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 11:26 ` Ramkumar Ramachandra
@ 2013-06-20 13:12   ` Francis Moreau
  2013-06-20 13:47     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 12+ messages in thread
From: Francis Moreau @ 2013-06-20 13:12 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: git

On Thu, Jun 20, 2013 at 1:26 PM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> Francis Moreau wrote:
>> To get the commit set which can't be reached by master (ie commits
>> which are specific to branches other than master) I would do:
>>
>>   # "$@" is the range spec passed to the script
>>   git rev-list "$@" ^master | check_other_commit
>>
>> But I don't know if it's possible to use a different git-rev-list
>> command to get the rest of the commits, ie the ones that are reachable
>> by the specified range and master.
>>
>> One way to do that is to record the first commit set got by the first
>> rev-list command and check that the ones returned by "git rev-list $@"
>> are not in the record.
>
> I don't fully understand your query, because almost anything is
> possible with rev-list:
>
>   $ git rev-list foo..bar master # reachable from master, bar, not foo
>
> What I _suspect_ you're asking is for help when you can't construct
> this "foo..bar master" programmatically (or when you cannot express
> your criterion as arguments to rev-list).  You want an initial commit
> set, and filter it at various points in your program using various
> criteria, right?

Yes, I would like to be sure that I haven't missed some magic syntax
for rev-list before going further in my poor man solution :)

Basically I have an initial set (or can be several different sets)
expressed as a revision specification described by git-rev-list man
page. I just want to find the common set of commit which are part of
the initial sets *and* is reachable by master.

I would write it:

     git rev-list "$@" --and master

> In that case, I'd suggest something like this:

Thanks for the details example.

--
Francis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 13:04 ` Phil Hord
@ 2013-06-20 13:17   ` Francis Moreau
  0 siblings, 0 replies; 12+ messages in thread
From: Francis Moreau @ 2013-06-20 13:17 UTC (permalink / raw)
  To: Phil Hord; +Cc: git@vger.kernel.org

Hi,

On Thu, Jun 20, 2013 at 3:04 PM, Phil Hord <phil.hord@gmail.com> wrote:
> On Thu, Jun 20, 2013 at 6:14 AM, Francis Moreau <francis.moro@gmail.com> wrote:
>> I'd like to write a script that would parse commits in one of my repo.
>> Ideally this script should accept any revision ranges that
>> git-rev-list would accept.
>>
>> This script should consider commits in master differently than the
>> ones in others branches.
>>
>> To get the commit set which can't be reached by master (ie commits
>> which are specific to branches other than master) I would do:
>>
>>   # "$@" is the range spec passed to the script
>>   git rev-list "$@" ^master | check_other_commit
>>
>> But I don't know if it's possible to use a different git-rev-list
>> command to get the rest of the commits, ie the ones that are reachable
>> by the specified range and master.
>>
>> One way to do that is to record the first commit set got by the first
>> rev-list command and check that the ones returned by "git rev-list $@"
>> are not in the record.
>>
>> But I'm wondering if someone can see another solution more elegant ?
>
> I do not know if I would call this elegant, but I think this
> codification of your "One way to do that" is at least small and mostly
> readable:
>
>    git rev-list "$@" |grep -v -f <(git rev-list "$@" ^master)
>

Yes, thanks.

But I wanted to be sure that git-rev-list can't display the
intersection of several sets before going forward.

--
Francis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 10:14 Splitting a rev list into 2 sets Francis Moreau
  2013-06-20 11:26 ` Ramkumar Ramachandra
  2013-06-20 13:04 ` Phil Hord
@ 2013-06-20 13:20 ` Thomas Rast
  2013-06-20 16:24   ` Francis Moreau
  2 siblings, 1 reply; 12+ messages in thread
From: Thomas Rast @ 2013-06-20 13:20 UTC (permalink / raw)
  To: Francis Moreau; +Cc: git

Francis Moreau <francis.moro@gmail.com> writes:

> Hello,
>
> I'd like to write a script that would parse commits in one of my repo.
> Ideally this script should accept any revision ranges that
> git-rev-list would accept.
>
> This script should consider commits in master differently than the
> ones in others branches.
>
> To get the commit set which can't be reached by master (ie commits
> which are specific to branches other than master) I would do:
>
>   # "$@" is the range spec passed to the script
>   git rev-list "$@" ^master | check_other_commit
>
> But I don't know if it's possible to use a different git-rev-list
> command to get the rest of the commits, ie the ones that are reachable
> by the specified range and master.
>
> One way to do that is to record the first commit set got by the first
> rev-list command and check that the ones returned by "git rev-list $@"
> are not in the record.
>
> But I'm wondering if someone can see another solution more elegant ?

I think there's a cute way.  Suppose your arguments are of the form

  p1 p2 ... --not n1 n2 ...

that is each pX is positive, and each nX is negative.  Then as you
observed, building the difference with master is easy: just add it to
the negative args.

Intersecting with master is harder, because you don't know what parts of
it (if any) are in the range.  But the --boundary option can help: these
are the commits where the positive and negative ranges "first" met, and
prevented the walk from continuing.

So the part of master reachable from p1, p2, etc. is exactly the set of
boundary commits of 'p1 p2 ... ^master'.  And on top of that, excluding
the parts reachable from the n's is easy.  So you can do:

  positive=$(git rev-parse "$@" | grep -v '^\^')
  negative=$(git rev-parse "$@" | grep '^\^')
  boundary=$(git rev-list --boundary $positive ^master | sed -n 's/^-//p')
  # the intersection is
  git rev-list $boundary $negative

I haven't tested it much, however.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 13:12   ` Francis Moreau
@ 2013-06-20 13:47     ` Ramkumar Ramachandra
  2013-06-21  7:15       ` Francis Moreau
  0 siblings, 1 reply; 12+ messages in thread
From: Ramkumar Ramachandra @ 2013-06-20 13:47 UTC (permalink / raw)
  To: Francis Moreau; +Cc: git

Francis Moreau wrote:
> Basically I have an initial set (or can be several different sets)
> expressed as a revision specification described by git-rev-list man
> page. I just want to find the common set of commit which are part of
> the initial sets *and* is reachable by master.

That's just a generic list intersection between

  [a, b, c] and [d, e, f]

no?  [a, b, c] is a list you built up somehow, and [d, e, f] comes
from $(git rev-list master), right?

You could go about determining the revision walk boundaries and
combine them to set up a revision walk to splice the master line, but
what is the point of that?  You'll only be painting yourself into a
design-corner (you won't be able to do other kinds of filtering), and
going around your head to touch your nose.  You precisely want list
intersection: so write an efficient list intersection in the language
of your choice.  Why is it a poor man's solution?  If anything, your
convoluted rev-list solution will probably be more complicated,
slower, and bug-ridden.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 13:20 ` Thomas Rast
@ 2013-06-20 16:24   ` Francis Moreau
  2013-06-24  9:59     ` Thomas Rast
  0 siblings, 1 reply; 12+ messages in thread
From: Francis Moreau @ 2013-06-20 16:24 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git@vger.kernel.org

Hi,

On Thu, Jun 20, 2013 at 3:20 PM, Thomas Rast <trast@inf.ethz.ch> wrote:
> Francis Moreau <francis.moro@gmail.com> writes:
>>
>> But I'm wondering if someone can see another solution more elegant ?
>
> I think there's a cute way.  Suppose your arguments are of the form

Really nice !

>
>   p1 p2 ... --not n1 n2 ...
>
> that is each pX is positive, and each nX is negative.  Then as you
> observed, building the difference with master is easy: just add it to
> the negative args.

I didn't know that git-rev-parse could be used to transform any range
specification into that form (p1 p2 .. -not n1 n2..)

>
> Intersecting with master is harder, because you don't know what parts of
> it (if any) are in the range.  But the --boundary option can help: these
> are the commits where the positive and negative ranges "first" met, and
> prevented the walk from continuing.
>
> So the part of master reachable from p1, p2, etc. is exactly the set of
> boundary commits of 'p1 p2 ... ^master'.  And on top of that, excluding
> the parts reachable from the n's is easy.  So you can do:

Really clever.

>
>   positive=$(git rev-parse "$@" | grep -v '^\^')
>   negative=$(git rev-parse "$@" | grep '^\^')
>   boundary=$(git rev-list --boundary $positive ^master | sed -n 's/^-//p')
>   # the intersection is
>   git rev-list $boundary $negative

I think there's a minor issue here, when boundary is empty. Please
correct me if I'm wrong but I think it can only happen if positive is
simply master or a subset of master. In that case I think the solution
is just make boundary equal to positive:

     # the intersection is
     git rev-list ${boundary:-$positive} $negative

Now I'm going to see if that solution is faster than the initial one.

Great Thanks
--
Francis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 13:47     ` Ramkumar Ramachandra
@ 2013-06-21  7:15       ` Francis Moreau
  2013-06-21  7:19         ` Ramkumar Ramachandra
  0 siblings, 1 reply; 12+ messages in thread
From: Francis Moreau @ 2013-06-21  7:15 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: git@vger.kernel.org

Hi,

On Thu, Jun 20, 2013 at 3:47 PM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> Francis Moreau wrote:
>> Basically I have an initial set (or can be several different sets)
>> expressed as a revision specification described by git-rev-list man
>> page. I just want to find the common set of commit which are part of
>> the initial sets *and* is reachable by master.
>
> That's just a generic list intersection between
>
>   [a, b, c] and [d, e, f]
>
> no?  [a, b, c] is a list you built up somehow, and [d, e, f] comes
> from $(git rev-list master), right?

yes.

>
> You could go about determining the revision walk boundaries and
> combine them to set up a revision walk to splice the master line, but
> what is the point of that?

Well, that seems to me a more elegant solution and I was curious about
doing this with git-rev-list only if possible.

>  You'll only be painting yourself into a
> design-corner (you won't be able to do other kinds of filtering), and
> going around your head to touch your nose.

I think what Thomas proposed is fine.

>  You precisely want list
> intersection: so write an efficient list intersection in the language
> of your choice.  Why is it a poor man's solution?

Sorry my wording was poor. I just meant that it was the obvious
solution that I don't find nice. But your implementation was good.

>  If anything, your
> convoluted rev-list solution will probably be more complicated,
> slower, and bug-ridden.

Slower ? why do you think Thomas' solution is slower than the obvious one ?

Thanks
--
Francis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-21  7:15       ` Francis Moreau
@ 2013-06-21  7:19         ` Ramkumar Ramachandra
  0 siblings, 0 replies; 12+ messages in thread
From: Ramkumar Ramachandra @ 2013-06-21  7:19 UTC (permalink / raw)
  To: Francis Moreau; +Cc: git@vger.kernel.org

Francis Moreau wrote:
> Slower ? why do you think Thomas' solution is slower than the obvious one ?

There's really only one way to find out: try it and see. YMMV
depending on your data.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-20 16:24   ` Francis Moreau
@ 2013-06-24  9:59     ` Thomas Rast
  2013-06-25  8:09       ` Francis Moreau
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Rast @ 2013-06-24  9:59 UTC (permalink / raw)
  To: Francis Moreau; +Cc: git@vger.kernel.org

Francis Moreau <francis.moro@gmail.com> writes:

> On Thu, Jun 20, 2013 at 3:20 PM, Thomas Rast <trast@inf.ethz.ch> wrote:
>>   positive=$(git rev-parse "$@" | grep -v '^\^')
>>   negative=$(git rev-parse "$@" | grep '^\^')
>>   boundary=$(git rev-list --boundary $positive ^master | sed -n 's/^-//p')
>>   # the intersection is
>>   git rev-list $boundary $negative
>
> I think there's a minor issue here, when boundary is empty. Please
> correct me if I'm wrong but I think it can only happen if positive is
> simply master or a subset of master. In that case I think the solution
> is just make boundary equal to positive:
>
>      # the intersection is
>      git rev-list ${boundary:-$positive} $negative
>
> Now I'm going to see if that solution is faster than the initial one.

Jan "jast" Krüger pointed out on #git that

  git log $(git merge-base --all A B)

is exactly the set of commits reachable from both A and B; so there's
your intersection operator :-)

So it would seem that a much simpler approach is

  git rev-list $(git merge-base --all master $positive) --not $negative

avoiding the boundary handling and special-case.  It relies on the
(weird?) property that $(git merge-base --all A B1 B2 ...) shows the
merge bases of A with a hypothetical merge of B1, B2, ..., which is just
what you need here.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Splitting a rev list into 2 sets
  2013-06-24  9:59     ` Thomas Rast
@ 2013-06-25  8:09       ` Francis Moreau
  0 siblings, 0 replies; 12+ messages in thread
From: Francis Moreau @ 2013-06-25  8:09 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git@vger.kernel.org

Hello Thomas,

On Mon, Jun 24, 2013 at 11:59 AM, Thomas Rast <trast@inf.ethz.ch> wrote:
> Francis Moreau <francis.moro@gmail.com> writes:
>
>> On Thu, Jun 20, 2013 at 3:20 PM, Thomas Rast <trast@inf.ethz.ch> wrote:
>>>   positive=$(git rev-parse "$@" | grep -v '^\^')
>>>   negative=$(git rev-parse "$@" | grep '^\^')
>>>   boundary=$(git rev-list --boundary $positive ^master | sed -n 's/^-//p')
>>>   # the intersection is
>>>   git rev-list $boundary $negative
>>
>> I think there's a minor issue here, when boundary is empty. Please
>> correct me if I'm wrong but I think it can only happen if positive is
>> simply master or a subset of master. In that case I think the solution
>> is just make boundary equal to positive:
>>
>>      # the intersection is
>>      git rev-list ${boundary:-$positive} $negative
>>
>> Now I'm going to see if that solution is faster than the initial one.
>
> Jan "jast" Krüger pointed out on #git that
>
>   git log $(git merge-base --all A B)
>
> is exactly the set of commits reachable from both A and B; so there's
> your intersection operator :-)

nice :)

>
> So it would seem that a much simpler approach is
>
>   git rev-list $(git merge-base --all master $positive) --not $negative
>
> avoiding the boundary handling and special-case.  It relies on the
> (weird?) property that $(git merge-base --all A B1 B2 ...) shows the
> merge bases of A with a hypothetical merge of B1, B2, ..., which is just
> what you need here.

Thank you Thomas, that's exactly what I was asking for :)

--
Francis

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-06-25  8:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-20 10:14 Splitting a rev list into 2 sets Francis Moreau
2013-06-20 11:26 ` Ramkumar Ramachandra
2013-06-20 13:12   ` Francis Moreau
2013-06-20 13:47     ` Ramkumar Ramachandra
2013-06-21  7:15       ` Francis Moreau
2013-06-21  7:19         ` Ramkumar Ramachandra
2013-06-20 13:04 ` Phil Hord
2013-06-20 13:17   ` Francis Moreau
2013-06-20 13:20 ` Thomas Rast
2013-06-20 16:24   ` Francis Moreau
2013-06-24  9:59     ` Thomas Rast
2013-06-25  8:09       ` Francis Moreau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).