From: Junio C Hamano <gitster@pobox.com>
To: Kevin Willford <kcwillford@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Use path comparison for patch ids before the file content
Date: Thu, 14 Jul 2016 15:13:45 -0700 [thread overview]
Message-ID: <xmqqy453ookm.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20160714201758.13180-1-kcwillford@gmail.com> (Kevin Willford's message of "Thu, 14 Jul 2016 16:17:58 -0400")
Kevin Willford <kcwillford@gmail.com> writes:
> When limiting the list in a revision walk using cherry pick, patch ids are
> calculated by producing the diff of the content of the files. This would
> be more efficent by using a patch id looking at the paths that were
> changed in the commits and only if all the file changed are the same fall
> back to getting the content of the files in the commits to determine if
> the commits are the same.
The basic idea of this change makes sense. When we have many
commits, but if we can tell that no other commit changes the same
set of paths as this commit does, we can immediately know that this
commit cannot have an equivalent other commit among the rest. By
first computing a lot cheaper "hash of touched paths" for commits,
and throwing them into separate bins keyed by the "hash of touched
paths", you can narrow the commits whose patch IDs must be compared,
and if a bin happens to be a singleton, you do not even need to
produce any patch ID by running a textual diff. I like it.
Explaining this as "hash of touched paths" is somewhat misleading.
Your "use_path_only" mode actually hashes a lot more than just
paths. Because the "use_path_only" mode actually hashes the entire
basic diff header and not just paths, it can differentiate a commit
that adds a file and another commit that modifies the same file, for
example.
> ... This will speed up a rebase where the
> upstream has many changes but none of them have been pulled into the
> current branch.
> ---
Missing sign-off.
> diff.c | 16 +++++----
> diff.h | 2 +-
The changes in the above two files looked OK to me.
I didn't read the changes to the other three files carefully.
> patch-ids.c | 114 +++++++++++++++++++++++++++++-------------------------------
> patch-ids.h | 7 ++--
> revision.c | 19 ++--------
> 5 files changed, 73 insertions(+), 85 deletions(-)
>
> diff --git a/patch-ids.c b/patch-ids.c
> index a4d0016..f0262ce 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -4,8 +4,9 @@
> ...
> +}
> \ No newline at end of file
No newline at end of file.
Thanks.
next prev parent reply other threads:[~2016-07-14 22:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-14 20:17 [PATCH] Use path comparison for patch ids before the file content Kevin Willford
2016-07-14 22:13 ` Junio C Hamano [this message]
2016-07-15 11:59 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqy453ookm.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=kcwillford@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).