git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ramkumar Ramachandra <artagnon@gmail.com>
To: Francis Moreau <francis.moro@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Splitting a rev list into 2 sets
Date: Thu, 20 Jun 2013 16:56:26 +0530	[thread overview]
Message-ID: <CALkWK0=6ZofURGvC-FtS81765yDsA9+0wW94riPZUPudc_nDyw@mail.gmail.com> (raw)
In-Reply-To: <CAC9WiBi-E+LN4hKGeu0mG7ihJWCaTg-W1Dx_PWmX_vsx-uLOaw@mail.gmail.com>

Francis Moreau wrote:
> To get the commit set which can't be reached by master (ie commits
> which are specific to branches other than master) I would do:
>
>   # "$@" is the range spec passed to the script
>   git rev-list "$@" ^master | check_other_commit
>
> But I don't know if it's possible to use a different git-rev-list
> command to get the rest of the commits, ie the ones that are reachable
> by the specified range and master.
>
> One way to do that is to record the first commit set got by the first
> rev-list command and check that the ones returned by "git rev-list $@"
> are not in the record.

I don't fully understand your query, because almost anything is
possible with rev-list:

  $ git rev-list foo..bar master # reachable from master, bar, not foo

What I _suspect_ you're asking is for help when you can't construct
this "foo..bar master" programmatically (or when you cannot express
your criterion as arguments to rev-list).  You want an initial commit
set, and filter it at various points in your program using various
criteria, right?  In that case, I'd suggest something like this:

    # Returns a list of commits given a committish that `rev-list`
    # accepts.
    def self.list_commits(committish)
        commits = []
        revlist = execute("git", "rev-list", "--reverse", "--date-order",
                          "--simplify-merges", committish).chomp.split("\n")

        # do it in batches of 1000 commits
        while revlist
            these_revs = revlist.first(1000).join("\n")
            this_chunk = execute({ :in => these_revs }, "git",
                               "cat-file", "--batch")

            # parse_cat_file parses the chunk and updates @commit_index
            parse_cat_file(this_chunk) { |struct| commits << struct }

            revlist = revlist[1000 .. revlist.length - 1]
        end
        return commits
    end

    # Filters a list of commits with the precondition that it exists
    # in the committish.  :sha1 is used to uniquely identify a commit.
    def self.filter_commits(commits, committish)
        revlist = execute("git", "rev-list", "--simplify-merges",
                                 committish).split("\n")
        allowed_commits = revlist.map { |sha1| @commit_index[sha1.hex] }
        return commits & allowed_commits
    end

In essence, I use '&' to filter and it's extremely fast.  The trick is
to shell out to git sparingly, store the data you get in a sensible
manner, and build fast custom filters based on what you want.  Here
are a few more examples:

    # Filters a list of commits with the precondition that it is a
    # first-parent commit in a given committish.
    def self.filter_fp_commits(commits, committish)
        revlist = execute("git", "rev-list", "--first-parent",
                          "--simplify-merges", committish).split("\n")
        allowed_commits = revlist.map { |sha1| @commit_index[sha1.hex] }
        return commits & allowed_commits
    end

    # Slice a list of commits using a start_hex and end_hex, which
    # may both be nil.
    def self.slice_commits(commits, start_commit, end_commit)
        start_idx = commits.index(start_commit)
        end_idx = commits.index(end_commit)
        start_idx = 0 if start_idx.nil?
        end_idx = commits.size - 1 if end_idx.nil?
        return commits[start_idx..end_idx]
    end

    def self.filter_commits_tree_path(commits, path)
        commit_chunk = (commits.map { |commit| commit.sha1 }).join("\n")
        commit_chunk = "#{commit_chunk}\n"
        diff_tree_chunk = execute({ :in => commit_chunk }, "git", "diff-tree", \
                                  "-m", "-r", "-s", "--stdin", path)
        matching_sha1s = diff_tree_chunk.split("\n")
        allowed_commits = matching_sha1s.map { |sha1| @commit_index[sha1.hex] }
        return commits & allowed_commits
    end

Did that help?

  reply	other threads:[~2013-06-20 11:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 10:14 Splitting a rev list into 2 sets Francis Moreau
2013-06-20 11:26 ` Ramkumar Ramachandra [this message]
2013-06-20 13:12   ` Francis Moreau
2013-06-20 13:47     ` Ramkumar Ramachandra
2013-06-21  7:15       ` Francis Moreau
2013-06-21  7:19         ` Ramkumar Ramachandra
2013-06-20 13:04 ` Phil Hord
2013-06-20 13:17   ` Francis Moreau
2013-06-20 13:20 ` Thomas Rast
2013-06-20 16:24   ` Francis Moreau
2013-06-24  9:59     ` Thomas Rast
2013-06-25  8:09       ` Francis Moreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALkWK0=6ZofURGvC-FtS81765yDsA9+0wW94riPZUPudc_nDyw@mail.gmail.com' \
    --to=artagnon@gmail.com \
    --cc=francis.moro@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).