From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Elijah Newren <newren@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index
Date: Sun, 29 Apr 2018 22:53:11 +0200 (DST) [thread overview]
Message-ID: <nycvar.QRO.7.76.6.1804292251000.79@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CACsJy8DyP_mXXJKn52Jzqe63N3GLpXePCr8ha97Lv9hr6u-M0w@mail.gmail.com>
Hi Duy,
On Sun, 29 Apr 2018, Duy Nguyen wrote:
> On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren <newren@gmail.com> wrote:
> > Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> > The code in unpack_trees() does not correctly handle them being different.
> > There are two separate issues:
> >
> > First, there is the possibility of memory corruption. Since
> > unpack_trees() creates a temporary index in o->result and then discards
> > o->dst_index and overwrites it with o->result, in the special case that
> > o->src_index == o->dst_index, it is safe to just reuse o->src_index's
> > split_index for o->result. However, when src and dst are different,
> > reusing o->src_index's split_index for o->result will cause the
> > split_index to be shared. If either index then has entries replaced or
> > removed, it will result in the other index referring to free()'d memory.
> >
> > Second, we can drop the index extensions. Previously, we were moving
> > index extensions from o->dst_index to o->result. Since o->src_index is
> > the one that will have the necessary extensions (o->dst_index is likely to
> > be a new index temporary index created to store the results), we should be
> > moving the index extensions from there.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >
> > Differences from v2:
> > - Don't NULLify src_index until we're done using it
> > - Actually built and tested[1]
> >
> > But it now passes the testsuite on both linux and mac[2], and I even re-merged
> > all 53288 merge commits in linux.git (with a merge of this patch together with
> > the directory rename detection series) for good measure. [Only 7 commits
> > showed a difference, all due to directory rename detection kicking in.]
> >
> > [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
> > parallelization are great until you realize that your new setup omitted a
> > critical step, leaving you running a slightly stale version of git instead...
> > :-(
> >
> > [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
> > with unicode normalization tests, but those two tests fail before my changes
> > too. All the other tests pass.
> >
> > unpack-trees.c | 19 +++++++++++++++----
> > 1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index e73745051e..49526d70aa 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > o->result.timestamp.sec = o->src_index->timestamp.sec;
> > o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> > o->result.version = o->src_index->version;
> > - o->result.split_index = o->src_index->split_index;
> > - if (o->result.split_index)
> > + if (!o->src_index->split_index) {
> > + o->result.split_index = NULL;
> > + } else if (o->src_index == o->dst_index) {
> > + /*
> > + * o->dst_index (and thus o->src_index) will be discarded
> > + * and overwritten with o->result at the end of this function,
> > + * so just use src_index's split_index to avoid having to
> > + * create a new one.
> > + */
> > + o->result.split_index = o->src_index->split_index;
> > o->result.split_index->refcount++;
> > + } else {
> > + o->result.split_index = init_split_index(&o->result);
> > + }
> > hashcpy(o->result.sha1, o->src_index->sha1);
> > o->merge_size = len;
> > mark_all_ce_unused(o->src_index);
> > @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > }
> > }
> >
> > - o->src_index = NULL;
> > ret = check_updates(o) ? (-2) : 0;
> > if (o->dst_index) {
> > if (!ret) {
> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > WRITE_TREE_SILENT |
> > WRITE_TREE_REPAIR);
> > }
> > - move_index_extensions(&o->result, o->dst_index);
> > + move_index_extensions(&o->result, o->src_index);
>
> While this looks like the right thing to do on paper, I believe it's
> actually broken for a specific case of untracked cache. In short,
> please do not touch this line. I will send a patch to revert
> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
> which essentially deletes this line, with proper explanation and
> perhaps a test if I could come up with one.
>
> When we update the index, we depend on the fact that all updates must
> invalidate the right untracked cache correctly. In this unpack
> operations, we start copying entries over from src to result. Since
> 'result' (at least from the beginning) does not have an untracked
> cache, it has nothing to invalidate when we copy entries over. By the
> time we have done preparing 'result', what's recorded in src's (or
> dst's for that matter) untracked cache may or may not apply to
> 'result' index anymore. This copying only leads to more problems when
> untracked cache is used.
Is there really no way to invalidate just individual entries?
I have a couple of worktrees which are *huge*. And edf3b90553 really
helped relieve the pain a bit when running `git status`. Now you say that
even a `git checkout -b new-branch` would blow the untracked cache away
again?
It would be *really* nice if we could prevent that performance regression
somehow.
Ciao,
Dscho
next prev parent reply other threads:[~2018-04-29 20:53 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-19 17:57 [PATCH v10 00/36] Add directory rename detection to git Elijah Newren
2018-04-19 17:57 ` [PATCH v10 01/36] directory rename detection: basic testcases Elijah Newren
2018-04-19 17:57 ` [PATCH v10 02/36] directory rename detection: directory splitting testcases Elijah Newren
2018-04-19 17:57 ` [PATCH v10 03/36] directory rename detection: testcases to avoid taking detection too far Elijah Newren
2018-04-19 17:57 ` [PATCH v10 04/36] directory rename detection: partially renamed directory testcase/discussion Elijah Newren
2018-04-19 17:57 ` [PATCH v10 05/36] directory rename detection: files/directories in the way of some renames Elijah Newren
2018-04-19 17:57 ` [PATCH v10 06/36] directory rename detection: testcases checking which side did the rename Elijah Newren
2018-04-19 17:57 ` [PATCH v10 07/36] directory rename detection: more involved edge/corner testcases Elijah Newren
2018-04-19 17:57 ` [PATCH v10 08/36] directory rename detection: testcases exploring possibly suboptimal merges Elijah Newren
2018-04-19 17:57 ` [PATCH v10 09/36] directory rename detection: miscellaneous testcases to complete coverage Elijah Newren
2018-04-19 17:57 ` [PATCH v10 10/36] directory rename detection: tests for handling overwriting untracked files Elijah Newren
2018-04-19 17:57 ` [PATCH v10 11/36] directory rename detection: tests for handling overwriting dirty files Elijah Newren
2018-04-19 17:57 ` [PATCH v10 12/36] merge-recursive: move the get_renames() function Elijah Newren
2018-04-19 17:58 ` [PATCH v10 13/36] merge-recursive: introduce new functions to handle rename logic Elijah Newren
2018-04-19 17:58 ` [PATCH v10 14/36] merge-recursive: fix leaks of allocated renames and diff_filepairs Elijah Newren
2018-04-19 17:58 ` [PATCH v10 15/36] merge-recursive: make !o->detect_rename codepath more obvious Elijah Newren
2018-04-19 17:58 ` [PATCH v10 16/36] merge-recursive: split out code for determining diff_filepairs Elijah Newren
2018-04-19 17:58 ` [PATCH v10 17/36] merge-recursive: make a helper function for cleanup for handle_renames Elijah Newren
2018-04-19 17:58 ` [PATCH v10 18/36] merge-recursive: add get_directory_renames() Elijah Newren
2018-05-06 23:41 ` SZEDER Gábor
2018-05-07 15:45 ` [PATCH] fixup! " Elijah Newren
2019-10-09 20:38 ` [PATCH v10 18/36] " Johannes Schindelin
2019-10-11 20:02 ` Elijah Newren
2019-10-12 19:23 ` Johannes Schindelin
2018-04-19 17:58 ` [PATCH v10 19/36] merge-recursive: check for directory level conflicts Elijah Newren
2018-04-19 17:58 ` [PATCH v10 20/36] merge-recursive: add computation of collisions due to dir rename & merging Elijah Newren
2018-04-19 17:58 ` [PATCH v10 21/36] merge-recursive: check for file level conflicts then get new name Elijah Newren
2018-04-19 17:58 ` [PATCH v10 22/36] merge-recursive: when comparing files, don't include trees Elijah Newren
2018-04-19 17:58 ` [PATCH v10 23/36] merge-recursive: apply necessary modifications for directory renames Elijah Newren
2018-04-19 17:58 ` [PATCH v10 24/36] merge-recursive: avoid clobbering untracked files with " Elijah Newren
2018-04-19 17:58 ` [PATCH v10 25/36] merge-recursive: fix overwriting dirty files involved in renames Elijah Newren
2018-04-19 20:48 ` Martin Ågren
2018-04-19 20:54 ` Martin Ågren
2018-04-19 21:06 ` Elijah Newren
2018-04-19 17:58 ` [PATCH v10 26/36] merge-recursive: fix remaining directory rename + dirty overwrite cases Elijah Newren
2018-04-19 17:58 ` [PATCH v10 27/36] directory rename detection: new testcases showcasing a pair of bugs Elijah Newren
2018-04-19 17:58 ` [PATCH v10 28/36] merge-recursive: avoid spurious rename/rename conflict from dir renames Elijah Newren
2018-04-19 17:58 ` [PATCH v10 29/36] merge-recursive: improve add_cacheinfo error handling Elijah Newren
2018-04-19 17:58 ` [PATCH v10 30/36] merge-recursive: move more is_dirty handling to merge_content Elijah Newren
2018-04-19 17:58 ` [PATCH v10 31/36] merge-recursive: avoid triggering add_cacheinfo error with dirty mod Elijah Newren
2018-04-19 17:58 ` [PATCH v10 32/36] t6046: testcases checking whether updates can be skipped in a merge Elijah Newren
2018-04-19 20:26 ` SZEDER Gábor
2018-04-19 20:55 ` Elijah Newren
2018-04-19 17:58 ` [PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths Elijah Newren
2018-04-19 20:39 ` Martin Ågren
2018-04-19 20:54 ` Elijah Newren
2018-04-20 12:23 ` SZEDER Gábor
2018-04-20 15:23 ` Elijah Newren
2018-04-21 19:37 ` [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst Elijah Newren
2018-04-21 20:13 ` Elijah Newren
2018-04-22 12:38 ` Duy Nguyen
2018-04-23 17:09 ` Elijah Newren
2018-04-23 17:37 ` Duy Nguyen
2018-04-23 18:05 ` Elijah Newren
2018-04-24 0:24 ` [PATCH v2] unpack_trees: fix breakage when o->src_index != o->dst_index Elijah Newren
2018-04-24 1:51 ` Junio C Hamano
2018-04-24 3:05 ` Junio C Hamano
2018-04-24 6:50 ` [PATCH v3] " Elijah Newren
2018-04-29 18:05 ` Duy Nguyen
2018-04-29 20:53 ` Johannes Schindelin [this message]
2018-04-30 14:42 ` Duy Nguyen
2018-04-30 14:45 ` Duy Nguyen
2018-04-30 16:19 ` Elijah Newren
2018-04-30 16:29 ` Duy Nguyen
2018-04-19 17:58 ` [PATCH v10 34/36] merge-recursive: fix remainder of was_dirty() to use original index Elijah Newren
2018-04-19 17:58 ` [PATCH v10 35/36] merge-recursive: make "Auto-merging" comment show for other merges Elijah Newren
2018-04-19 17:58 ` [PATCH v10 36/36] merge-recursive: fix check for skipability of working tree updates Elijah Newren
2018-04-19 18:35 ` [PATCH v10 00/36] Add directory rename detection to git Elijah Newren
2018-04-19 18:41 ` Stefan Beller
2018-04-19 19:54 ` Derrick Stolee
2018-04-19 20:22 ` Elijah Newren
2018-04-20 3:05 ` Junio C Hamano
2018-04-23 17:50 ` Elijah Newren
2018-04-24 20:20 ` [PATCH v10 1/2] fixup! merge-recursive: fix was_tracked() to quit lying with some renamed paths Elijah Newren
2018-04-24 20:21 ` [PATCH v10 2/2] fixup! t6046: testcases checking whether updates can be skipped in a merge Elijah Newren
2018-04-23 17:28 ` [PATCH v10 00/36] Add directory rename detection to git Elijah Newren
2018-04-23 23:46 ` Junio C Hamano
2018-04-24 0:15 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.QRO.7.76.6.1804292251000.79@tvgsbejvaqbjf.bet \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).