git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peff@peff.net,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v9 3/3] read-cache: speed up add_index_entry during checkout
Date: Fri, 14 Apr 2017 09:06:14 -0400	[thread overview]
Message-ID: <d233feed-2494-e68b-5fe6-4e1bd43ef423@jeffhostetler.com> (raw)
In-Reply-To: <xmqq37deqpyw.fsf@gitster.mtv.corp.google.com>



On 4/11/2017 11:12 PM, Junio C Hamano wrote:
> git@jeffhostetler.com writes:
>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach add_index_entry_with_check() and has_dir_name()
>> to see if the path of the new item is greater than the
>> last path in the index array before attempting to search
>> for it.
>>
>> During checkout, merge_working_tree() populates the new
>> index in sorted order, so this change will save at least 2
>> binary lookups per file.
>
> Smart and simple.
>
>> diff --git a/read-cache.c b/read-cache.c
>> index 97f13a1..a8ef823 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -918,9 +918,24 @@ static int has_dir_name(struct index_state *istate,
>>  	int stage = ce_stage(ce);
>>  	const char *name = ce->name;
>>  	const char *slash = name + ce_namelen(ce);
>> +	size_t len_eq_last;
>> +	int cmp_last = 0;
>> +
>> +	if (istate->cache_nr > 0) {
>> +		/*
>> +		 * Compare the entry's full path with the last path in the index.
>> +		 * If it sorts AFTER the last entry in the index and they have no
>> +		 * common prefix, then there cannot be any F/D name conflicts.
>> +		 */
>> +		cmp_last = strcmp_offset(name,
>> +			istate->cache[istate->cache_nr-1]->name,
>
> Style?  "istate->cache[istate->cache_nr - 1]->name"
>
>> +			&len_eq_last);
>> +		if (cmp_last > 0 && len_eq_last == 0)
>> +			return retval;
>> +	}
>
> Let me follow the logic aloud.  Say the last entry in the cache is
> "x/y".  If we came here with ce->name == "x", we need to worry about
> nuking the existing entry "x/y".  But if we have "zoo", that cannot
> possibly overlap and we can safely return 0.
>
> That sounds correct, except that it might be playing overly safe.
> If we came here with "xx", we still are safe, but len_eq_last would
> be non-zero.  Probably it is not worth the extra complexity to catch
> it here (rather than letting the loop below to handle it).
>
>>  	for (;;) {
>> -		int len;
>> +		size_t len;
>>
>>  		for (;;) {
>>  			if (*--slash == '/')
>> @@ -930,6 +945,24 @@ static int has_dir_name(struct index_state *istate,
>>  		}
>>  		len = slash - name;
>
> Mental note: cmp_last may be 0, >0 or <0 at this point in the very
> first iteration of the loop.  It is not updated in the loop.  The
> variable len_eq_last are used to carry the information about the
> last entry we learned at the beginning of this function---the new
> special case happens only when the path we are adding sorts later
> than the last existing entry (i.e. cmp_last > 0).
>
>> +		if (cmp_last > 0) {
>> +			/*
>> +			 * If this part of the directory prefix (including the trailing
>> +			 * slash) already appears in the path of the last entry in the
>> +			 * index, then we cannot also have a file with this prefix (or
>> +			 * any parent directory prefix).
>> +			 */
>> +			if (len+1 <= len_eq_last)
>
> Style?  "len + 1".
>
>> +				return retval;
>> +			/*
>> +			 * If this part of the directory prefix (excluding the trailing
>> +			 * slash) is longer than the known equal portions, then this part
>> +			 * of the prefix cannot collide with a file.  Go on to the parent.
>> +			 */
>> +			if (len > len_eq_last)
>> +				continue;
>
> Hmph, is the reasoning used in the two conditionals above sound?
> Does this work correctly even when the last existing entry in the
> cache is marked with CE_REMOVE?

I'll double check my math here.  I also want to think about
the "too conservative" comment above.

WRT if the last entry is CE_REMOVE, I think we are still OK
because we are asking if the given ce is strictly greater than
the last entry so that we can append it rather than searching
for a collision or insertion point.

Thanks,
Jeff

  reply	other threads:[~2017-04-14 13:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11 19:16 [PATCH v9 0/3] read-cache: speed up add_index_entry git
2017-04-11 19:17 ` [PATCH v9 1/3] read-cache: add strcmp_offset function git
2017-04-11 19:17 ` [PATCH v9 2/3] p0006-read-tree-checkout: perf test to time read-tree git
2017-04-11 19:17 ` [PATCH v9 3/3] read-cache: speed up add_index_entry during checkout git
2017-04-12  3:12   ` Junio C Hamano
2017-04-14 13:06     ` Jeff Hostetler [this message]
2017-04-11 19:26 ` [PATCH v9 0/3] read-cache: speed up add_index_entry Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d233feed-2494-e68b-5fe6-4e1bd43ef423@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).