git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
Date: Sat, 7 Nov 2020 11:39:39 -0800	[thread overview]
Message-ID: <CABPp-BEzZ18KfX0kgOja+yeXbuEZmMORWjGePUMTdyzvqXLkFg@mail.gmail.com> (raw)
In-Reply-To: <ca61acbf-1757-5ddc-49f7-6da0cba4a647@gmail.com>

On Sat, Nov 7, 2020 at 7:02 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/7/20 1:06 AM, Elijah Newren wrote:
> > Hi Derrick,
> >
> > On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
> >> All that said, for testing either branch you just need to first set
> >> pull.twohead=ort in your git config (see
> >> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
> >> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
> >
> > I probably also should have mentioned that merge-ort does not (yet?)
> > heed merge.renames configuration setting; it always detects renames.
> > I know you run with merge.renames=false, so you won't quite get an
> > apples-to-apples comparison.  However, part of my point was I wanted
> > to make renames fast enough that they could be left turned on, even
> > for the large scale repos, so I'm very interested in your experience.
> > If you need an escape hatch, though, just put a "return 1" at the top
> > of detect_and_process_renames() to turn it off.
> >
> > Oh, and I went through and re-merged all the merge commits in the
> > linux kernel and found a bug in merge-ort while doing that (causing it
> > to die, not to merge badly).  I'm kind of surprised that none of my
> > testcases triggered that failure earlier; if you're testing it out,
> > you might want to update to get the fix (commit 067e5c1a38,
> > "merge-ort: fix bug with cached_target_names not being initialized in
> > redos", 2020-11-06).
>
> I did manage to do some testing to see what happens with
> a large repo under a small sparse-checkout. And using
> trace2, I was able to see that your code is being exercised.
> Unfortunately, I didn't see any performance improvement, and
> that is likely due to needing to expand the index entirely
> when checking out the merge commit.
>
> Is there a command to construct a merge commit without
> actually checking it out? That would reduce the time spent
> expanding the index, which would allow your algorithm to
> really show its benefits!

Wow, very interesting.  I am working on a --remerge-diff option for
log, which implies -p and is similar to -c or --cc in that it makes
merge commits show a diff, but which in particular remerges the two
parent commits complete with conflict markers and such and then diffs
the merge commit against that intermediate remerge.  That's a case
that constructs a merge commit without ever touching the index (or
working tree)...but there's no equivalent comparison point for
merge-recursive.  So, it doesn't provide something to compare against
(and while the code can be used I don't actually have a --remerge-diff
option yet -- it just hardcodes the behavior on if wanted or not), so
I'm not sure if you'd be interested in it.  If you are, let me know
though, and I'll send details.

However, I'm really surprised here, because merge-recursive always
reads and writes the index too (the index is the basis for its whole
algorithm).  In fact, merge-recursive always reads the index at least
*twice* (it unconditionally discards and re-reads the index), so you
must have some kind of specialized tweaking of merge-recursive if it
somehow avoids a full index read/write.  In order to do an
apples-to-apples comparison, we'd need to make those same tweaks to
merge-ort, but I don't have a clue what kind of tweaks you've made
here.  So, some investigation points:

*1*. Could you give me the accumulated times from the trace2_regions
so we can verify where the time is spent?  The 'summarize-perf' script
at the toplevel of the repo in my ort branch might be helpful for
this; just prefix any git command with that script and it accumulates
the trace2 region times and prints them out.  For example, I could run
'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:

=== BEGIN OUTPUT ===
$ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
Done.
Accumulated times:
    0.031 : <unmeasured> ( 3.2%)
    0.837 : 35 : label:incore_nonrecursive
       0.003 : <unmeasured> ( 0.4%)
       0.476 : 41 : ..label:collect_merge_info
          0.001 : <unmeasured> ( 0.2%)
          0.475 : 41 : ....label:traverse_trees
       0.298 : 41 : ..label:renames
          0.015 : <unmeasured> ( 5.1%)
          0.280 : 41 : ....label:regular renames
             0.036 : <unmeasured> (12.7%)
             0.244 : 6 : ......label:diffcore_rename
                0.001 : <unmeasured> ( 0.4%)
                0.078 : 6 : ........label:dir rename setup
                0.055 : 6 : ........label:basename matches
                0.051 : 6 : ........label:exact renames
                0.031 : 6 : ........label:write back to queue
                0.017 : 6 : ........label:setup
                0.009 : 6 : ........label:cull basename
                0.003 : 6 : ........label:cull exact
          0.002 : 35 : ....label:directory renames
          0.001 : 35 : ....label:process renames
       0.052 : 35 : ..label:process_entries
          0.001 : <unmeasured> ( 1.7%)
          0.033 : 35 : ....label:processing
          0.017 : 35 : ....label:process_entries setup
             0.001 : <unmeasured> ( 5.8%)
             0.008 : 35 : ......label:plist copy
             0.008 : 35 : ......label:plist sort
             0.000 : 35 : ......label:plist grow
          0.001 : 35 : ....label:finalize
       0.005 : 35 : ..label:merge_start
          0.001 : <unmeasured> (18.8%)
          0.004 : 34 : ....label:reset_maps
          0.000 : 35 : ....label:sanity checks
          0.000 : 1 : ....label:allocate/init
       0.003 : 6 : ..label:reset_maps
    0.035 : 1 : label:do_write_index
/home/newren/floss/linux-stable/.git/index.lock
    0.034 : 1 : label:checkout
       0.034 : <unmeasured> (99.9%)
       0.000 : 1 : ..label:Filtering content
    0.009 : 1 : label:do_read_index .git/index
    0.000 : 1 : label:write_auto_merge
    0.000 : 1 : label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 679):
0.006790000000000001
Timing including forking:  0.960 (0.013 additional seconds)
=== END OUTPUT ===
This was a run that took just under 1s (and was a hot-cache case; I
had just done the same rebase before to warm the caches), and the
combination of index/working tree bits (everything at and after
do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
corresponding to just over 8.1% of overall time.  I'm curious where
that lands for your repository testcase; if the larger time ends up
somewhere under the indented label:incore_nonrecursive region, then
it's due to something other than index reading/updating/writing.

*2*. If it really is due to index reading/updating/writing, then index
handling in merge-ort is confined to two functions: checkout() and
record_unmerged_index_entries().  Both functions aren't too long, and
neither one calls into any other function within merge-ort.c.
(Further, checkout() is a near copy of code from merge_working_tree()
in builtin/checkout.c, or at least a copy of that function from a year
or so ago.)  As such, it's possible you can go in and make whatever
special tweaks you have for partial index reading/writing to those
functions.

I'm curious to hear back more on this.

Elijah

  reply	other threads:[~2020-11-07 19:39 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05   ` Jonathan Tan
2020-11-06 22:45     ` Elijah Newren
2020-11-09 20:55       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren
2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren
2020-11-11 13:52   ` Derrick Stolee
2020-11-11 16:22     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren
2020-11-11 13:54   ` Derrick Stolee
2020-11-11 16:47     ` Elijah Newren
2020-11-11 16:51       ` Derrick Stolee
2020-11-11 17:03         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren
2020-11-11 13:58   ` Derrick Stolee
2020-11-11 17:07     ` Elijah Newren
2020-11-11 17:10       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren
2020-11-06 22:19   ` Jonathan Tan
2020-11-06 23:10     ` Elijah Newren
2020-11-09 20:59       ` Jonathan Tan
2020-11-11 14:38   ` Derrick Stolee
2020-11-11 17:02     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren
2020-11-11 14:51   ` Derrick Stolee
2020-11-11 17:13     ` Elijah Newren
2020-11-11 17:21       ` Eric Sunshine
2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren
2020-11-06 22:52   ` Jonathan Tan
2020-11-06 23:41     ` Elijah Newren
2020-11-09 22:04       ` Jonathan Tan
2020-11-09 23:05         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren
2020-11-06 22:58   ` Jonathan Tan
2020-11-07  0:26     ` Elijah Newren
2020-11-09 22:09       ` Jonathan Tan
2020-11-09 23:08         ` Elijah Newren
2020-11-11 15:26   ` Derrick Stolee
2020-11-11 18:16     ` Elijah Newren
2020-11-11 22:06       ` Elijah Newren
2020-11-12 18:23         ` Derrick Stolee
2020-11-12 18:39       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren
2020-11-11 15:31   ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren
2020-11-11 19:51   ` Jonathan Tan
2020-11-12  1:48     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren
2020-11-11 16:09   ` Derrick Stolee
2020-11-11 18:58     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren
2020-11-11 20:01   ` Jonathan Tan
2020-11-11 20:24     ` Elijah Newren
2020-11-12 20:39       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren
2020-11-11 20:47   ` Jonathan Tan
2020-11-11 21:21     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren
2020-11-12 20:15   ` Jonathan Tan
2020-11-12 22:30     ` Elijah Newren
2020-11-24 20:19       ` Elijah Newren
2020-11-25  2:07         ` Jonathan Tan
2020-11-26 18:13           ` Elijah Newren
2020-11-30 18:41             ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren
2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren
2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren [this message]
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-11-29  7:43 [PATCH " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEzZ18KfX0kgOja+yeXbuEZmMORWjGePUMTdyzvqXLkFg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).