git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: "Jeff Hostetler" <git@jeffhostetler.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net,
	Jeff Hostetler <jeffhost@microsoft.com>,
	Brandon Williams <bwilliamseng@gmail.com>
Subject: Re: [PATCH v12 5/5] read-cache: speed up has_dir_name (part 2)
Date: Thu, 16 Jul 2020 19:11:48 +0200	[thread overview]
Message-ID: <6d769a78-72aa-5b5d-9cc7-41b8c0ddf9d8@web.de> (raw)
In-Reply-To: <be8dec3e-087e-b973-280d-3d0d2e481de6@jeffhostetler.com>

Am 08.07.20 um 15:49 schrieb Jeff Hostetler:
>
>
> On 7/6/20 2:39 AM, Junio C Hamano wrote:
>> SZEDER Gábor <szeder.dev@gmail.com> writes:
>>
>>>> +                 * last: xxx/yy-file (because '-' sorts before '/')
>>>> +                 * this: xxx/yy/abc
>>>
>>> This is problematic, because the index can already contain 'xxx/yy' as
>>> a file, when adding 'xxx/yy/abc', but since 'xxx/yy' as a file sorts
>>> before 'xxx/yy-file', the short-circuiting here doesn't see it and
>>> thus leaves the d-f collision undetected.  Consequently, even Git
>>> porcelain commands can create tree objects with duplicate entries, as
>>> demonstrated in the tests below.
>>
>> Yeah, the "optimization" is quite bogus.  Thanks for catching it.
>>
>
> yes, thanks!

OK, so now we just need to fix it.  Like this perhaps?

-- >8 --
From f4b2c70a34bb612e2ccc23a31e2ba8e01dedc373 Mon Sep 17 00:00:00 2001
Subject: [PATCH] read-cache: remove bogus shortcut

has_dir_name() has some optimizations for the case where entries are
added to an index in the correct order.  They kick in if the new entry
sorts after the last one.  One of them exits early if the last entry has
a longer name than the directory of the new entry.  Here's its comment:

/*
 * The directory prefix lines up with part of
 * a longer file or directory name, but sorts
 * after it, so this sub-directory cannot
 * collide with a file.
 *
 * last: xxx/yy-file (because '-' sorts before '/')
 * this: xxx/yy/abc
 */

However, a file named xxx/yy would be sorted before xxx/yy-file because
'-' sorts after NUL, so the length check against the last entry is not
sufficient to rule out a collision.  Remove it.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 read-cache.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index aa427c5c17..8ed1c29b54 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1171,20 +1171,6 @@ static int has_dir_name(struct index_state *istate,
 				return retval;
 			}

-			if (istate->cache_nr > 0 &&
-				ce_namelen(istate->cache[istate->cache_nr - 1]) > len) {
-				/*
-				 * The directory prefix lines up with part of
-				 * a longer file or directory name, but sorts
-				 * after it, so this sub-directory cannot
-				 * collide with a file.
-				 *
-				 * last: xxx/yy-file (because '-' sorts before '/')
-				 * this: xxx/yy/abc
-				 */
-				return retval;
-			}
-
 			/*
 			 * This is a possible collision. Fall through and
 			 * let the regular search code handle it.
--
2.27.0

      reply	other threads:[~2020-07-16 17:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-19 17:06 [PATCH v12 0/5] read-cache: speed up add_index_entry git
2017-04-19 17:06 ` [PATCH v12 1/5] read-cache: add strcmp_offset function git
2017-04-19 17:06 ` [PATCH v12 2/5] p0006-read-tree-checkout: perf test to time read-tree git
2017-04-19 17:06 ` [PATCH v12 3/5] read-cache: speed up add_index_entry during checkout git
2017-04-19 17:06 ` [PATCH v12 4/5] read-cache: speed up has_dir_name (part 1) git
2017-04-19 17:06 ` [PATCH v12 5/5] read-cache: speed up has_dir_name (part 2) git
2020-07-04 17:27   ` SZEDER Gábor
2020-07-05  5:27     ` René Scharfe
2020-07-06  6:39     ` Junio C Hamano
2020-07-08 13:49       ` Jeff Hostetler
2020-07-16 17:11         ` René Scharfe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d769a78-72aa-5b5d-9cc7-41b8c0ddf9d8@web.de \
    --to=l.s.r@web.de \
    --cc=bwilliamseng@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).