From: Elijah Newren <email@example.com> To: Derrick Stolee <firstname.lastname@example.org> Cc: Git Mailing List <email@example.com> Subject: Re: [PATCH v2 04/20] merge-ort: use histogram diff Date: Wed, 11 Nov 2020 08:47:20 -0800 [thread overview] Message-ID: <CABPp-BFd2Wg-9h9+Gx20szX=YfKLqvBMWGg-eZU0yL8_DPh6kw@mail.gmail.com> (raw) In-Reply-To: <firstname.lastname@example.org> On Wed, Nov 11, 2020 at 5:54 AM Derrick Stolee <email@example.com> wrote: > > On 11/2/2020 3:43 PM, Elijah Newren wrote: > > I have some ideas for using a histogram diff to improve content merges, > > which fundamentally relies on the idea of a histogram. Since the diffs > > are never displayed to the user but just used internally for merging, > > the typical user preference shouldn't matter anyway, and I want to make > > sure that all my testing works with this algorithm. > > > > Granted, I don't yet know if those ideas will pan out and I haven't even > > tried any of them out yet, but it's easy to change the diff algorithm in > > the future if needed or wanted. For now, just set it to histogram. > > If you are not making use of the histogram yet, then could you set this > patch aside until you _do_ use it? Or are there performance implications > that are also a side benefit? Long story... git folks tend to value performance pretty strongly -- including sometimes valuing it OVER correctness. For example, if fast-export completely munges some merge commits and you send your first ever patch in to the list to fix it (by turning on topo_order), you might run into folks asking if it should be made an option so that we don't slow down exports except for people who happen to know they need it (and thus risk breaking exports for people who happen to be unaware that they need it...). Luckily, Peff bailed me out in that situation by doing some timings and finding that topo_order actually made fast-export _faster_, to everyone's surprise at the time. Or, to take another example, perhaps someone will introduce some commit-date cutoff on the revision walking machinery that sometimes breaks the answer but makes things faster...and then causes a bunch of headaches years down the road when someone tries to introduce commit-graphs to get us always correct _and_ fast answers. In this case, histogram diffs in my cursory investigation are about 2% slower than Myers diffs. I think others may have done even more detailed benchmarks. They've been around for years, but haven't been made the default, despite giving obviously better looking diffs to users in a number of cases where Myers diffs are unintelligible. But, far more importantly, there are real merge bugs we know about that are even affecting git.git and linux.git that I don't have a clue how to address without the additional information that I believe is provided by histogram diffs. See the following: https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/ https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/ https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/ I don't like mismerges. I really don't like silent mismerges. While I am sometimes willing to make performance and correctness tradeoff, I'm much more interested in correctness in general. I want to fix the above bugs. I have not yet started doing so, but I believe histogram diff at least gives me an angle. But I can't rely on using the information from histogram diff unless it's in use. And it hasn't been used because of a few percentage performance hit. But, since I happen to be speeding up typical non-trivial merges/rebases/cherry-picks by factors of 10 or more (at least, anywhere that read-and-update-the-index is a trivial percentage of overall time instead of 99.9% of overall time like it is for you), now is golden opportunity to switch out the underlying diff algorithm so that I can get the data I need to fix the bugs I know are there. Whether I can actually fix them is yet to be seen; I won't even start until merge-ort is complete and merged. Does that help?
next prev parent reply other threads:[~2020-11-11 16:47 UTC|newest] Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren 2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren 2020-11-06 22:05 ` Jonathan Tan 2020-11-06 22:45 ` Elijah Newren 2020-11-09 20:55 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren 2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren 2020-11-11 13:52 ` Derrick Stolee 2020-11-11 16:22 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren 2020-11-11 13:54 ` Derrick Stolee 2020-11-11 16:47 ` Elijah Newren [this message] 2020-11-11 16:51 ` Derrick Stolee 2020-11-11 17:03 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren 2020-11-11 13:58 ` Derrick Stolee 2020-11-11 17:07 ` Elijah Newren 2020-11-11 17:10 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren 2020-11-06 22:19 ` Jonathan Tan 2020-11-06 23:10 ` Elijah Newren 2020-11-09 20:59 ` Jonathan Tan 2020-11-11 14:38 ` Derrick Stolee 2020-11-11 17:02 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren 2020-11-11 14:51 ` Derrick Stolee 2020-11-11 17:13 ` Elijah Newren 2020-11-11 17:21 ` Eric Sunshine 2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren 2020-11-06 22:52 ` Jonathan Tan 2020-11-06 23:41 ` Elijah Newren 2020-11-09 22:04 ` Jonathan Tan 2020-11-09 23:05 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren 2020-11-06 22:58 ` Jonathan Tan 2020-11-07 0:26 ` Elijah Newren 2020-11-09 22:09 ` Jonathan Tan 2020-11-09 23:08 ` Elijah Newren 2020-11-11 15:26 ` Derrick Stolee 2020-11-11 18:16 ` Elijah Newren 2020-11-11 22:06 ` Elijah Newren 2020-11-12 18:23 ` Derrick Stolee 2020-11-12 18:39 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren 2020-11-11 15:31 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren 2020-11-11 19:51 ` Jonathan Tan 2020-11-12 1:48 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren 2020-11-11 16:09 ` Derrick Stolee 2020-11-11 18:58 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren 2020-11-11 20:01 ` Jonathan Tan 2020-11-11 20:24 ` Elijah Newren 2020-11-12 20:39 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren 2020-11-11 20:47 ` Jonathan Tan 2020-11-11 21:21 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren 2020-11-12 20:15 ` Jonathan Tan 2020-11-12 22:30 ` Elijah Newren 2020-11-24 20:19 ` Elijah Newren 2020-11-25 2:07 ` Jonathan Tan 2020-11-26 18:13 ` Elijah Newren 2020-11-30 18:41 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren 2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren 2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee 2020-11-03 16:36 ` Elijah Newren 2020-11-07 6:06 ` Elijah Newren 2020-11-07 15:02 ` Derrick Stolee 2020-11-07 19:39 ` Elijah Newren 2020-11-09 12:30 ` Derrick Stolee 2020-11-09 17:13 ` Elijah Newren 2020-11-09 19:51 ` Derrick Stolee 2020-11-09 22:44 ` Elijah Newren 2020-11-11 17:08 ` Derrick Stolee 2020-11-11 18:35 ` Elijah Newren 2020-11-11 20:48 ` Derrick Stolee 2020-11-11 21:18 ` Elijah Newren 2020-11-29 7:43 [PATCH " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CABPp-BFd2Wg-9h9+Gx20szX=YfKLqvBMWGg-eZU0yL8_DPh6kw@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH v2 04/20] merge-ort: use histogram diff' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
email@example.com list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ firstname.lastname@example.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git