git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v4 00/10] Optimization batch 8: use file basenames even more
Date: Sat, 27 Feb 2021 00:30:38 +0000	[thread overview]
Message-ID: <pull.844.v4.git.1614385849.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.844.v3.git.1614304699.gitgitgadget@gmail.com>

This series depends on en/diffcore-rename (a concatenation of what I was
calling ort-perf-batch-6 and ort-perf-batch-7).

Changes since v3:

 * Update the commit messages (one was out of date after the rearrangement),
   and include Stolee's Reviewed-by

Elijah Newren (10):
  diffcore-rename: use directory rename guided basename comparisons
  diffcore-rename: provide basic implementation of idx_possible_rename()
  diffcore-rename: add a mapping of destination names to their indices
  Move computation of dir_rename_count from merge-ort to diffcore-rename
  diffcore-rename: add function for clearing dir_rename_count
  diffcore-rename: move dir_rename_counts into dir_rename_info struct
  diffcore-rename: extend cleanup_dir_rename_info()
  diffcore-rename: compute dir_rename_counts in stages
  diffcore-rename: limit dir_rename_counts computation to relevant dirs
  diffcore-rename: compute dir_rename_guess from dir_rename_counts

 Documentation/gitdiffcore.txt |   2 +-
 diffcore-rename.c             | 449 ++++++++++++++++++++++++++++++++--
 diffcore.h                    |   7 +
 merge-ort.c                   | 144 +----------
 4 files changed, 449 insertions(+), 153 deletions(-)


base-commit: aeca14f748afc7fb5b65bca56ea2ebd970729814
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-844%2Fnewren%2Fort-perf-batch-8-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-844/newren/ort-perf-batch-8-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/844

Range-diff vs v3:

  1:  6afa9add40b9 !  1:  823d07532e00 diffcore-rename: use directory rename guided basename comparisons
     @@ Commit message
          min_basename_score threshold required for marking the two files as
          renames.
      
     -    This commit introduces an idx_possible_rename() function which will give
     +    This commit introduces an idx_possible_rename() function which will
          do this directory rename detection for us and give us the index within
          rename_dst of the resulting filename.  For now, this function is
          hardcoded to return -1 (not found) and just hooks up how its results
          would be used once we have a more complete implementation in place.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## Documentation/gitdiffcore.txt ##
  2:  40f57bcc2055 !  2:  2dde621d7de5 diffcore-rename: add a new idx_possible_rename function
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    diffcore-rename: add a new idx_possible_rename function
     +    diffcore-rename: provide basic implementation of idx_possible_rename()
      
     -    find_basename_matches() is great when both the remaining set of possible
     -    rename sources and the remaining set of possible rename destinations
     -    have exactly one file each with a given basename.  It allows us to match
     -    up files that have been moved to different directories without changing
     -    filenames.
     +    Add a new struct dir_rename_info with various values we need inside our
     +    idx_possible_rename() function introduced in the previous commit.  Add a
     +    basic implementation for this function showing how we plan to use the
     +    variables, but which will just return early with a value of -1 (not
     +    found) when those variables are not set up.
      
     -    When basenames are not unique, though, we want to be able to guess which
     -    directories the source files have been moved to.  Since this is the job
     -    of directory rename detection, we employ it.  However, since it is a
     -    directory rename detection idea, we also limit it to cases where we know
     -    there could have been a directory rename, i.e. where the source
     -    directory has been removed.  This has to be signalled by dirs_removed
     -    being non-NULL and containing an entry for the relevant directory.
     -    Since merge-ort.c is the only caller that currently does so, this
     -    optimization is only effective for merge-ort right now.  In the future,
     -    this condition could be reconsidered or we could modify other callers to
     -    pass the necessary strset.
     -
     -    Anyway, that's a lot of background so that we can actually describe the
     -    new function.  Add an idx_possible_rename() function which combines the
     -    recently added dir_rename_guess and idx_map fields to provide the index
     -    within rename_dst of a potential match for a given file.
     -
     -    Future commits will add checks after calling this function to compare
     -    the resulting 'likely rename' candidates to see if the two files meet
     -    the elevated min_basename_score threshold for marking them as actual
     -    renames.
     +    Future commits will do the work necessary to set up those other
     +    variables so that idx_possible_rename() does not always return -1.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  3:  0e14961574ea !  3:  21b9cf1da30e diffcore-rename: add a mapping of destination names to their indices
     @@ Commit message
          dir_rename_guess; these will be more fully populated in subsequent
          commits.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  4:  9b9d5b207b03 !  4:  3617b0209cc4 Move computation of dir_rename_count from merge-ort to diffcore-rename
     @@ Commit message
          preliminary computation of dir_rename_count after exact rename
          detection, followed by some updates after inexact rename detection.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  5:  f286e89464ea !  5:  2baf39d82f3e diffcore-rename: add function for clearing dir_rename_count
     @@ Commit message
          for clearing, or partially clearing it out.  Add a
          partial_clear_dir_rename_count() function for this purpose.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  6:  ab353f2e75eb !  6:  02f1f7c02d32 diffcore-rename: move dir_rename_counts into dir_rename_info struct
     @@ Commit message
          dir_rename_info struct.  Future commits will then make dir_rename_counts
          be computed in stages, and add computation of dir_rename_guess.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  7:  bd50d9e53804 !  7:  9c3436840534 diffcore-rename: extend cleanup_dir_rename_info()
     @@ Commit message
          Extend cleanup_dir_rename_info() to handle these two different cases,
          cleaning up the relevant bits of information for each case.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  8:  44cfae6505f2 !  8:  6bd398d3707e diffcore-rename: compute dir_rename_counts in stages
     @@ Commit message
          augment the counts via calling update_dir_rename_counts() after each
          basename-guide and inexact rename detection match is found.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
  9:  752aff3a7995 !  9:  46304aaebf5a diffcore-rename: limit dir_rename_counts computation to relevant dirs
     @@ Commit message
          info->relevant_source_dirs variable for this purpose, even though at
          this stage we will only set it to dirs_removed for simplicity.
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##
 10:  65f7bfb735f2 ! 10:  4be565c47208 diffcore-rename: compute dir_rename_guess from dir_rename_counts
     @@ Commit message
              mega-renames:    188.754 s ±  0.284 s   130.465 s ±  0.259 s
              just-one-mega:     5.599 s ±  0.019 s     3.958 s ±  0.010 s
      
     +    Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## diffcore-rename.c ##

-- 
gitgitgadget

  parent reply	other threads:[~2021-02-27  0:32 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14  7:58 [PATCH 00/10] Optimization batch 8: use file basenames even more Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 02/10] diffcore-rename: add functions for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 04/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 05/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 06/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 08/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 10/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-23 23:43 ` [PATCH v2 00/10] Optimization batch 8: use file basenames even more Elijah Newren via GitGitGadget
2021-02-23 23:43   ` [PATCH v2 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-24 15:25     ` Derrick Stolee
2021-02-24 18:50       ` Elijah Newren
2021-02-23 23:43   ` [PATCH v2 02/10] diffcore-rename: add functions for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 04/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-24 15:37     ` Derrick Stolee
2021-02-25  2:16     ` Ævar Arnfjörð Bjarmason
2021-02-25  2:26       ` Ævar Arnfjörð Bjarmason
2021-02-25  2:34       ` Junio C Hamano
2021-02-23 23:44   ` [PATCH v2 05/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-24 15:43     ` Derrick Stolee
2021-02-23 23:44   ` [PATCH v2 06/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 08/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-24 17:35     ` Derrick Stolee
2021-02-25  1:13       ` Elijah Newren
2021-02-23 23:44   ` [PATCH v2 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 10/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-24 17:44     ` Derrick Stolee
2021-02-24 17:50   ` [PATCH v2 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
2021-02-25  1:38     ` Elijah Newren
2021-02-26  1:58   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 01/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 02/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-26 15:52       ` Derrick Stolee
2021-02-26  1:58     ` [PATCH v3 03/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-26 15:55       ` Derrick Stolee
2021-02-26  1:58     ` [PATCH v3 05/10] diffcore-rename: add function for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 07/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 08/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts Elijah Newren via GitGitGadget
2021-02-26 16:34     ` [PATCH v3 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
2021-02-26 19:28       ` Elijah Newren
2021-02-27  0:30     ` Elijah Newren via GitGitGadget [this message]
2021-02-27  0:30       ` [PATCH v4 01/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 02/10] diffcore-rename: provide basic implementation of idx_possible_rename() Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 03/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 05/10] diffcore-rename: add function for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 07/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 08/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts Elijah Newren via GitGitGadget
2021-03-09 21:52       ` [PATCH v4 00/10] Optimization batch 8: use file basenames even more Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.844.v4.git.1614385849.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).