From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v4 00/10] Optimization batch 8: use file basenames even more
Date: Sat, 27 Feb 2021 00:30:38 +0000 [thread overview]
Message-ID: <pull.844.v4.git.1614385849.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.844.v3.git.1614304699.gitgitgadget@gmail.com>
This series depends on en/diffcore-rename (a concatenation of what I was
calling ort-perf-batch-6 and ort-perf-batch-7).
Changes since v3:
* Update the commit messages (one was out of date after the rearrangement),
and include Stolee's Reviewed-by
Elijah Newren (10):
diffcore-rename: use directory rename guided basename comparisons
diffcore-rename: provide basic implementation of idx_possible_rename()
diffcore-rename: add a mapping of destination names to their indices
Move computation of dir_rename_count from merge-ort to diffcore-rename
diffcore-rename: add function for clearing dir_rename_count
diffcore-rename: move dir_rename_counts into dir_rename_info struct
diffcore-rename: extend cleanup_dir_rename_info()
diffcore-rename: compute dir_rename_counts in stages
diffcore-rename: limit dir_rename_counts computation to relevant dirs
diffcore-rename: compute dir_rename_guess from dir_rename_counts
Documentation/gitdiffcore.txt | 2 +-
diffcore-rename.c | 449 ++++++++++++++++++++++++++++++++--
diffcore.h | 7 +
merge-ort.c | 144 +----------
4 files changed, 449 insertions(+), 153 deletions(-)
base-commit: aeca14f748afc7fb5b65bca56ea2ebd970729814
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-844%2Fnewren%2Fort-perf-batch-8-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-844/newren/ort-perf-batch-8-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/844
Range-diff vs v3:
1: 6afa9add40b9 ! 1: 823d07532e00 diffcore-rename: use directory rename guided basename comparisons
@@ Commit message
min_basename_score threshold required for marking the two files as
renames.
- This commit introduces an idx_possible_rename() function which will give
+ This commit introduces an idx_possible_rename() function which will
do this directory rename detection for us and give us the index within
rename_dst of the resulting filename. For now, this function is
hardcoded to return -1 (not found) and just hooks up how its results
would be used once we have a more complete implementation in place.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## Documentation/gitdiffcore.txt ##
2: 40f57bcc2055 ! 2: 2dde621d7de5 diffcore-rename: add a new idx_possible_rename function
@@ Metadata
Author: Elijah Newren <newren@gmail.com>
## Commit message ##
- diffcore-rename: add a new idx_possible_rename function
+ diffcore-rename: provide basic implementation of idx_possible_rename()
- find_basename_matches() is great when both the remaining set of possible
- rename sources and the remaining set of possible rename destinations
- have exactly one file each with a given basename. It allows us to match
- up files that have been moved to different directories without changing
- filenames.
+ Add a new struct dir_rename_info with various values we need inside our
+ idx_possible_rename() function introduced in the previous commit. Add a
+ basic implementation for this function showing how we plan to use the
+ variables, but which will just return early with a value of -1 (not
+ found) when those variables are not set up.
- When basenames are not unique, though, we want to be able to guess which
- directories the source files have been moved to. Since this is the job
- of directory rename detection, we employ it. However, since it is a
- directory rename detection idea, we also limit it to cases where we know
- there could have been a directory rename, i.e. where the source
- directory has been removed. This has to be signalled by dirs_removed
- being non-NULL and containing an entry for the relevant directory.
- Since merge-ort.c is the only caller that currently does so, this
- optimization is only effective for merge-ort right now. In the future,
- this condition could be reconsidered or we could modify other callers to
- pass the necessary strset.
-
- Anyway, that's a lot of background so that we can actually describe the
- new function. Add an idx_possible_rename() function which combines the
- recently added dir_rename_guess and idx_map fields to provide the index
- within rename_dst of a potential match for a given file.
-
- Future commits will add checks after calling this function to compare
- the resulting 'likely rename' candidates to see if the two files meet
- the elevated min_basename_score threshold for marking them as actual
- renames.
+ Future commits will do the work necessary to set up those other
+ variables so that idx_possible_rename() does not always return -1.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
3: 0e14961574ea ! 3: 21b9cf1da30e diffcore-rename: add a mapping of destination names to their indices
@@ Commit message
dir_rename_guess; these will be more fully populated in subsequent
commits.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
4: 9b9d5b207b03 ! 4: 3617b0209cc4 Move computation of dir_rename_count from merge-ort to diffcore-rename
@@ Commit message
preliminary computation of dir_rename_count after exact rename
detection, followed by some updates after inexact rename detection.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
5: f286e89464ea ! 5: 2baf39d82f3e diffcore-rename: add function for clearing dir_rename_count
@@ Commit message
for clearing, or partially clearing it out. Add a
partial_clear_dir_rename_count() function for this purpose.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
6: ab353f2e75eb ! 6: 02f1f7c02d32 diffcore-rename: move dir_rename_counts into dir_rename_info struct
@@ Commit message
dir_rename_info struct. Future commits will then make dir_rename_counts
be computed in stages, and add computation of dir_rename_guess.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
7: bd50d9e53804 ! 7: 9c3436840534 diffcore-rename: extend cleanup_dir_rename_info()
@@ Commit message
Extend cleanup_dir_rename_info() to handle these two different cases,
cleaning up the relevant bits of information for each case.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
8: 44cfae6505f2 ! 8: 6bd398d3707e diffcore-rename: compute dir_rename_counts in stages
@@ Commit message
augment the counts via calling update_dir_rename_counts() after each
basename-guide and inexact rename detection match is found.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
9: 752aff3a7995 ! 9: 46304aaebf5a diffcore-rename: limit dir_rename_counts computation to relevant dirs
@@ Commit message
info->relevant_source_dirs variable for this purpose, even though at
this stage we will only set it to dirs_removed for simplicity.
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
10: 65f7bfb735f2 ! 10: 4be565c47208 diffcore-rename: compute dir_rename_guess from dir_rename_counts
@@ Commit message
mega-renames: 188.754 s ± 0.284 s 130.465 s ± 0.259 s
just-one-mega: 5.599 s ± 0.019 s 3.958 s ± 0.010 s
+ Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
## diffcore-rename.c ##
--
gitgitgadget
next prev parent reply other threads:[~2021-02-27 0:32 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-14 7:58 [PATCH 00/10] Optimization batch 8: use file basenames even more Elijah Newren via GitGitGadget
2021-02-14 7:58 ` [PATCH 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-14 7:58 ` [PATCH 02/10] diffcore-rename: add functions for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-14 7:58 ` [PATCH 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-14 7:58 ` [PATCH 04/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-14 7:58 ` [PATCH 05/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-14 7:58 ` [PATCH 06/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-14 7:59 ` [PATCH 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Elijah Newren via GitGitGadget
2021-02-14 7:59 ` [PATCH 08/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-14 7:59 ` [PATCH 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-14 7:59 ` [PATCH 10/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-23 23:43 ` [PATCH v2 00/10] Optimization batch 8: use file basenames even more Elijah Newren via GitGitGadget
2021-02-23 23:43 ` [PATCH v2 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-24 15:25 ` Derrick Stolee
2021-02-24 18:50 ` Elijah Newren
2021-02-23 23:43 ` [PATCH v2 02/10] diffcore-rename: add functions for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-23 23:44 ` [PATCH v2 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-23 23:44 ` [PATCH v2 04/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-24 15:37 ` Derrick Stolee
2021-02-25 2:16 ` Ævar Arnfjörð Bjarmason
2021-02-25 2:26 ` Ævar Arnfjörð Bjarmason
2021-02-25 2:34 ` Junio C Hamano
2021-02-23 23:44 ` [PATCH v2 05/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-24 15:43 ` Derrick Stolee
2021-02-23 23:44 ` [PATCH v2 06/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-23 23:44 ` [PATCH v2 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Elijah Newren via GitGitGadget
2021-02-23 23:44 ` [PATCH v2 08/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-24 17:35 ` Derrick Stolee
2021-02-25 1:13 ` Elijah Newren
2021-02-23 23:44 ` [PATCH v2 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-23 23:44 ` [PATCH v2 10/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-24 17:44 ` Derrick Stolee
2021-02-24 17:50 ` [PATCH v2 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
2021-02-25 1:38 ` Elijah Newren
2021-02-26 1:58 ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 01/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 02/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-26 15:52 ` Derrick Stolee
2021-02-26 1:58 ` [PATCH v3 03/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-26 15:55 ` Derrick Stolee
2021-02-26 1:58 ` [PATCH v3 05/10] diffcore-rename: add function for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 07/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 08/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-26 1:58 ` [PATCH v3 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts Elijah Newren via GitGitGadget
2021-02-26 16:34 ` [PATCH v3 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
2021-02-26 19:28 ` Elijah Newren
2021-02-27 0:30 ` Elijah Newren via GitGitGadget [this message]
2021-02-27 0:30 ` [PATCH v4 01/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 02/10] diffcore-rename: provide basic implementation of idx_possible_rename() Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 03/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 05/10] diffcore-rename: add function for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 07/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 08/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-27 0:30 ` [PATCH v4 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts Elijah Newren via GitGitGadget
2021-03-09 21:52 ` [PATCH v4 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.844.v4.git.1614385849.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).