git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Elijah Newren <newren@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index
Date: Sun, 29 Apr 2018 22:53:11 +0200 (DST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1804292251000.79@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CACsJy8DyP_mXXJKn52Jzqe63N3GLpXePCr8ha97Lv9hr6u-M0w@mail.gmail.com>

Hi Duy,

On Sun, 29 Apr 2018, Duy Nguyen wrote:

> On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren <newren@gmail.com> wrote:
> > Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> > The code in unpack_trees() does not correctly handle them being different.
> > There are two separate issues:
> >
> > First, there is the possibility of memory corruption.  Since
> > unpack_trees() creates a temporary index in o->result and then discards
> > o->dst_index and overwrites it with o->result, in the special case that
> > o->src_index == o->dst_index, it is safe to just reuse o->src_index's
> > split_index for o->result.  However, when src and dst are different,
> > reusing o->src_index's split_index for o->result will cause the
> > split_index to be shared.  If either index then has entries replaced or
> > removed, it will result in the other index referring to free()'d memory.
> >
> > Second, we can drop the index extensions.  Previously, we were moving
> > index extensions from o->dst_index to o->result.  Since o->src_index is
> > the one that will have the necessary extensions (o->dst_index is likely to
> > be a new index temporary index created to store the results), we should be
> > moving the index extensions from there.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >
> > Differences from v2:
> >   - Don't NULLify src_index until we're done using it
> >   - Actually built and tested[1]
> >
> > But it now passes the testsuite on both linux and mac[2], and I even re-merged
> > all 53288 merge commits in linux.git (with a merge of this patch together with
> > the directory rename detection series) for good measure.  [Only 7 commits
> > showed a difference, all due to directory rename detection kicking in.]
> >
> > [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
> > parallelization are great until you realize that your new setup omitted a
> > critical step, leaving you running a slightly stale version of git instead...
> > :-(
> >
> > [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
> > with unicode normalization tests, but those two tests fail before my changes
> > too.  All the other tests pass.
> >
> >  unpack-trees.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index e73745051e..49526d70aa 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >         o->result.timestamp.sec = o->src_index->timestamp.sec;
> >         o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> >         o->result.version = o->src_index->version;
> > -       o->result.split_index = o->src_index->split_index;
> > -       if (o->result.split_index)
> > +       if (!o->src_index->split_index) {
> > +               o->result.split_index = NULL;
> > +       } else if (o->src_index == o->dst_index) {
> > +               /*
> > +                * o->dst_index (and thus o->src_index) will be discarded
> > +                * and overwritten with o->result at the end of this function,
> > +                * so just use src_index's split_index to avoid having to
> > +                * create a new one.
> > +                */
> > +               o->result.split_index = o->src_index->split_index;
> >                 o->result.split_index->refcount++;
> > +       } else {
> > +               o->result.split_index = init_split_index(&o->result);
> > +       }
> >         hashcpy(o->result.sha1, o->src_index->sha1);
> >         o->merge_size = len;
> >         mark_all_ce_unused(o->src_index);
> > @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >                 }
> >         }
> >
> > -       o->src_index = NULL;
> >         ret = check_updates(o) ? (-2) : 0;
> >         if (o->dst_index) {
> >                 if (!ret) {
> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >                                                   WRITE_TREE_SILENT |
> >                                                   WRITE_TREE_REPAIR);
> >                 }
> > -               move_index_extensions(&o->result, o->dst_index);
> > +               move_index_extensions(&o->result, o->src_index);
> 
> While this looks like the right thing to do on paper, I believe it's
> actually broken for a specific case of untracked cache. In short,
> please do not touch this line. I will send a patch to revert
> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
> which essentially deletes this line, with proper explanation and
> perhaps a test if I could come up with one.
> 
> When we update the index, we depend on the fact that all updates must
> invalidate the right untracked cache correctly. In this unpack
> operations, we start copying entries over from src to result. Since
> 'result' (at least from the beginning) does not have an untracked
> cache, it has nothing to invalidate when we copy entries over. By the
> time we have done preparing 'result', what's recorded in src's (or
> dst's for that matter) untracked cache may or may not apply to
> 'result'  index anymore. This copying only leads to more problems when
> untracked cache is used.

Is there really no way to invalidate just individual entries?

I have a couple of worktrees which are *huge*. And edf3b90553 really
helped relieve the pain a bit when running `git status`. Now you say that
even a `git checkout -b new-branch` would blow the untracked cache away
again?

It would be *really* nice if we could prevent that performance regression
somehow.

Ciao,
Dscho

  reply	other threads:[~2018-04-29 20:53 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 17:57 [PATCH v10 00/36] Add directory rename detection to git Elijah Newren
2018-04-19 17:57 ` [PATCH v10 01/36] directory rename detection: basic testcases Elijah Newren
2018-04-19 17:57 ` [PATCH v10 02/36] directory rename detection: directory splitting testcases Elijah Newren
2018-04-19 17:57 ` [PATCH v10 03/36] directory rename detection: testcases to avoid taking detection too far Elijah Newren
2018-04-19 17:57 ` [PATCH v10 04/36] directory rename detection: partially renamed directory testcase/discussion Elijah Newren
2018-04-19 17:57 ` [PATCH v10 05/36] directory rename detection: files/directories in the way of some renames Elijah Newren
2018-04-19 17:57 ` [PATCH v10 06/36] directory rename detection: testcases checking which side did the rename Elijah Newren
2018-04-19 17:57 ` [PATCH v10 07/36] directory rename detection: more involved edge/corner testcases Elijah Newren
2018-04-19 17:57 ` [PATCH v10 08/36] directory rename detection: testcases exploring possibly suboptimal merges Elijah Newren
2018-04-19 17:57 ` [PATCH v10 09/36] directory rename detection: miscellaneous testcases to complete coverage Elijah Newren
2018-04-19 17:57 ` [PATCH v10 10/36] directory rename detection: tests for handling overwriting untracked files Elijah Newren
2018-04-19 17:57 ` [PATCH v10 11/36] directory rename detection: tests for handling overwriting dirty files Elijah Newren
2018-04-19 17:57 ` [PATCH v10 12/36] merge-recursive: move the get_renames() function Elijah Newren
2018-04-19 17:58 ` [PATCH v10 13/36] merge-recursive: introduce new functions to handle rename logic Elijah Newren
2018-04-19 17:58 ` [PATCH v10 14/36] merge-recursive: fix leaks of allocated renames and diff_filepairs Elijah Newren
2018-04-19 17:58 ` [PATCH v10 15/36] merge-recursive: make !o->detect_rename codepath more obvious Elijah Newren
2018-04-19 17:58 ` [PATCH v10 16/36] merge-recursive: split out code for determining diff_filepairs Elijah Newren
2018-04-19 17:58 ` [PATCH v10 17/36] merge-recursive: make a helper function for cleanup for handle_renames Elijah Newren
2018-04-19 17:58 ` [PATCH v10 18/36] merge-recursive: add get_directory_renames() Elijah Newren
2018-05-06 23:41   ` SZEDER Gábor
2018-05-07 15:45     ` [PATCH] fixup! " Elijah Newren
2019-10-09 20:38   ` [PATCH v10 18/36] " Johannes Schindelin
2019-10-11 20:02     ` Elijah Newren
2019-10-12 19:23       ` Johannes Schindelin
2018-04-19 17:58 ` [PATCH v10 19/36] merge-recursive: check for directory level conflicts Elijah Newren
2018-04-19 17:58 ` [PATCH v10 20/36] merge-recursive: add computation of collisions due to dir rename & merging Elijah Newren
2018-04-19 17:58 ` [PATCH v10 21/36] merge-recursive: check for file level conflicts then get new name Elijah Newren
2018-04-19 17:58 ` [PATCH v10 22/36] merge-recursive: when comparing files, don't include trees Elijah Newren
2018-04-19 17:58 ` [PATCH v10 23/36] merge-recursive: apply necessary modifications for directory renames Elijah Newren
2018-04-19 17:58 ` [PATCH v10 24/36] merge-recursive: avoid clobbering untracked files with " Elijah Newren
2018-04-19 17:58 ` [PATCH v10 25/36] merge-recursive: fix overwriting dirty files involved in renames Elijah Newren
2018-04-19 20:48   ` Martin Ågren
2018-04-19 20:54     ` Martin Ågren
2018-04-19 21:06     ` Elijah Newren
2018-04-19 17:58 ` [PATCH v10 26/36] merge-recursive: fix remaining directory rename + dirty overwrite cases Elijah Newren
2018-04-19 17:58 ` [PATCH v10 27/36] directory rename detection: new testcases showcasing a pair of bugs Elijah Newren
2018-04-19 17:58 ` [PATCH v10 28/36] merge-recursive: avoid spurious rename/rename conflict from dir renames Elijah Newren
2018-04-19 17:58 ` [PATCH v10 29/36] merge-recursive: improve add_cacheinfo error handling Elijah Newren
2018-04-19 17:58 ` [PATCH v10 30/36] merge-recursive: move more is_dirty handling to merge_content Elijah Newren
2018-04-19 17:58 ` [PATCH v10 31/36] merge-recursive: avoid triggering add_cacheinfo error with dirty mod Elijah Newren
2018-04-19 17:58 ` [PATCH v10 32/36] t6046: testcases checking whether updates can be skipped in a merge Elijah Newren
2018-04-19 20:26   ` SZEDER Gábor
2018-04-19 20:55     ` Elijah Newren
2018-04-19 17:58 ` [PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths Elijah Newren
2018-04-19 20:39   ` Martin Ågren
2018-04-19 20:54     ` Elijah Newren
2018-04-20 12:23   ` SZEDER Gábor
2018-04-20 15:23     ` Elijah Newren
2018-04-21 19:37     ` [RFC PATCH v10 32.5/36] unpack_trees: fix memory corruption with split_index when src != dst Elijah Newren
2018-04-21 20:13       ` Elijah Newren
2018-04-22 12:38       ` Duy Nguyen
2018-04-23 17:09         ` Elijah Newren
2018-04-23 17:37           ` Duy Nguyen
2018-04-23 18:05             ` Elijah Newren
2018-04-24  0:24               ` [PATCH v2] unpack_trees: fix breakage when o->src_index != o->dst_index Elijah Newren
2018-04-24  1:51                 ` Junio C Hamano
2018-04-24  3:05                 ` Junio C Hamano
2018-04-24  6:50                   ` [PATCH v3] " Elijah Newren
2018-04-29 18:05                     ` Duy Nguyen
2018-04-29 20:53                       ` Johannes Schindelin [this message]
2018-04-30 14:42                         ` Duy Nguyen
2018-04-30 14:45                           ` Duy Nguyen
2018-04-30 16:19                             ` Elijah Newren
2018-04-30 16:29                               ` Duy Nguyen
2018-04-19 17:58 ` [PATCH v10 34/36] merge-recursive: fix remainder of was_dirty() to use original index Elijah Newren
2018-04-19 17:58 ` [PATCH v10 35/36] merge-recursive: make "Auto-merging" comment show for other merges Elijah Newren
2018-04-19 17:58 ` [PATCH v10 36/36] merge-recursive: fix check for skipability of working tree updates Elijah Newren
2018-04-19 18:35 ` [PATCH v10 00/36] Add directory rename detection to git Elijah Newren
2018-04-19 18:41   ` Stefan Beller
2018-04-19 19:54     ` Derrick Stolee
2018-04-19 20:22   ` Elijah Newren
2018-04-20  3:05   ` Junio C Hamano
2018-04-23 17:50     ` Elijah Newren
2018-04-24 20:20     ` [PATCH v10 1/2] fixup! merge-recursive: fix was_tracked() to quit lying with some renamed paths Elijah Newren
2018-04-24 20:21       ` [PATCH v10 2/2] fixup! t6046: testcases checking whether updates can be skipped in a merge Elijah Newren
2018-04-23 17:28 ` [PATCH v10 00/36] Add directory rename detection to git Elijah Newren
2018-04-23 23:46   ` Junio C Hamano
2018-04-24  0:15     ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1804292251000.79@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).